954 resultados para Noisy data
Resumo:
Identifying product families has been considered as an effective way to accommodate the increasing product varieties across the diverse market niches. In this paper, we propose a novel framework to identifying product families by using a similarity measure for a common product design data BOM (Bill of Materials) based on data mining techniques such as frequent mining and clus-tering. For calculating the similarity between BOMs, a novel Extended Augmented Adjacency Matrix (EAAM) representation is introduced that consists of information not only of the content and topology but also of the fre-quent structural dependency among the various parts of a product design. These EAAM representations of BOMs are compared to calculate the similarity between products and used as a clustering input to group the product fami-lies. When applied on a real-life manufacturing data, the proposed framework outperforms a current baseline that uses orthogonal Procrustes for grouping product families.
Resumo:
Interpolation techniques for spatial data have been applied frequently in various fields of geosciences. Although most conventional interpolation methods assume that it is sufficient to use first- and second-order statistics to characterize random fields, researchers have now realized that these methods cannot always provide reliable interpolation results, since geological and environmental phenomena tend to be very complex, presenting non-Gaussian distribution and/or non-linear inter-variable relationship. This paper proposes a new approach to the interpolation of spatial data, which can be applied with great flexibility. Suitable cross-variable higher-order spatial statistics are developed to measure the spatial relationship between the random variable at an unsampled location and those in its neighbourhood. Given the computed cross-variable higher-order spatial statistics, the conditional probability density function (CPDF) is approximated via polynomial expansions, which is then utilized to determine the interpolated value at the unsampled location as an expectation. In addition, the uncertainty associated with the interpolation is quantified by constructing prediction intervals of interpolated values. The proposed method is applied to a mineral deposit dataset, and the results demonstrate that it outperforms kriging methods in uncertainty quantification. The introduction of the cross-variable higher-order spatial statistics noticeably improves the quality of the interpolation since it enriches the information that can be extracted from the observed data, and this benefit is substantial when working with data that are sparse or have non-trivial dependence structures.
Resumo:
The Echology: Making Sense of Data initiative seeks to break new ground in arts practice by asking artists to innovate with respect to a) the possible forms of data representation in public art and b) the artist's role in engaging publics on environmental sustainability in new urban developments. Initiated by ANAT and Carbon Arts in 2011, Echology has seen three artists selected by National competition in 2012 for Lend Lease sites across Australia. In 2013 commissioning of one of these works, the Mussel Choir by Natalie Jeremijenko, began in Melbourne's Victoria Harbour development. This emerging practice of data - driven and environmentally engaged public artwork presents multiple challenges to established systems of public arts production and management, at the same time as offering up new avenues for artists to forge new modes of collaboration. The experience of Echology and in particular, the Mussel Choir is examined here to reveal opportunities for expansion of this practice through identification of the factors that lead to a resilient 'ecology of part nership' between stakeholders that include science and technology researchers, education providers, city administrators, and urban developers.
Resumo:
Discovering the means to prevent and cure schizophrenia is a vision that motivates many scientists. But in order to achieve this goal, we need to understand its neurobiological basis. The emergent metadiscipline of cognitive neuroscience fields an impressive array of tools that can be marshaled towards achieving this goal, including powerful new methods of imaging the brain (both structural and functional) as well as assessments of perceptual and cognitive capacities based on psychophysical procedures, experimental tasks and models developed by cognitive science. We believe that the integration of data from this array of tools offers the greatest possibilities and potential for advancing understanding of the neural basis of not only normal cognition but also the cognitive impairments that are fundamental to schizophrenia. Since sufficient expertise in the application of these tools and methods rarely reside in a single individual, or even a single laboratory, collaboration is a key element in this endeavor. Here, we review some of the products of our integrative efforts in collaboration with our colleagues on the East Coast of Australia and Pacific Rim. This research focuses on the neural basis of executive function deficits and impairments in early auditory processing in patients using various combinations of performance indices (from perceptual and cognitive paradigms), ERPs, fMRI and sMRI. In each case, integration of two or more sources of information provides more information than any one source alone by revealing new insights into structure-function relationships. Furthermore, the addition of other imaging methodologies (such as DTI) and approaches (such as computational models of cognition) offers new horizons in human brain imaging research and in understanding human behavior.
Resumo:
Although there are many potential new insights to be gained through advancing research on the clients of male sex workers, significant social, ethical and methodological challenges to accessing this population exist. This research project case explores our attempts to recruit a population that does not typically form a cohesive or coherent 'community' and often avoids self-identifying to mitigate the stigma attached to buying sex. We used an arms-length recruitment campaign that focussed on directing potential participants to our study website, which could in turn lead them to participate in an anonymous telephone interview. Barriers to reaching male sex-work clients, however, demanded the evolution of our recruitment strategy. New technologies are part of the solution to accessing a hard-to-reach population, but they only work if researchers engage responsively. We also show how we conducted an in-depth interview with a client and discuss the value of using secondary data.
Resumo:
A tag-based item recommendation method generates an ordered list of items, likely interesting to a particular user, using the users past tagging behaviour. However, the users tagging behaviour varies in different tagging systems. A potential problem in generating quality recommendation is how to build user profiles, that interprets user behaviour to be effectively used, in recommendation models. Generally, the recommendation methods are made to work with specific types of user profiles, and may not work well with different datasets. In this paper, we investigate several tagging data interpretation and representation schemes that can lead to building an effective user profile. We discuss the various benefits a scheme brings to a recommendation method by highlighting the representative features of user tagging behaviours on a specific dataset. Empirical analysis shows that each interpretation scheme forms a distinct data representation which eventually affects the recommendation result. Results on various datasets show that an interpretation scheme should be selected based on the dominant usage in the tagging data (i.e. either higher amount of tags or higher amount of items present). The usage represents the characteristic of user tagging behaviour in the system. The results also demonstrate how the scheme is able to address the cold-start user problem.
Resumo:
Bactrocera papayae Drew & Hancock, Bactrocera philippinensis Drew & Hancock, Bactrocera carambolae Drew & Hancock, and Bactrocera invadens Drew, Tsuruta & White are four horticultural pest tephritid fruit fly species that are highly similar, morphologically and genetically, to the destructive pest, the Oriental fruit fly, Bactrocera dorsalis (Hendel) (Diptera: Tephritidae). This similarity has rendered the discovery of reliable diagnostic characters problematic, which, in view of the economic importance of these taxa and the international trade implications, has resulted in ongoing difficulties for many areas of plant protection and food security. Consequently, a major international collaborative and integrated multidisciplinary research effort was initiated in 2009 to build upon existing literature with the specific aim of resolving biological species limits among B. papayae, B. philippinensis, B. carambolae, B. invadens and B. dorsalis to overcome constraints to pest management and international trade. Bactrocera philippinensis has recently been synonymized with B. papayae as a result of this initiative and this review corroborates that finding; however, the other names remain in use. While consistent characters have been found to reliably distinguish B. carambolae from B. dorsalis, B. invadens and B. papayae, no such characters have been found to differentiate the latter three putative species. We conclude that B. carambolae is a valid species and that the remaining taxa, B. dorsalis, B. invadens and B. papayae, represent the same species. Thus, we consider B. dorsalis (Hendel) as the senior synonym of B. papayae Drew and Hancock syn.n. and B. invadens Drew, Tsuruta & White syn.n. A redescription of B. dorsalis is provided. Given the agricultural importance of B. dorsalis, this taxonomic decision will have significant global plant biosecurity implications, affecting pest management, quarantine, international trade, postharvest treatment and basic research. Throughout the paper, we emphasize the value of independent and multidisciplinary tools in delimiting species, particularly in complicated cases involving morphologically cryptic taxa.
Resumo:
This thesis presents a novel program parallelization technique incorporating with dynamic and static scheduling. It utilizes a problem specific pattern developed from the prior knowledge of the targeted problem abstraction. Suitable for solving complex parallelization problems such as data intensive all-to-all comparison constrained by memory, the technique delivers more robust and faster task scheduling compared to the state-of-the art techniques. Good performance is achieved from the technique in data intensive bioinformatics applications.
Resumo:
This pilot project investigated the existing practices and processes of Proficient, Highly Accomplished and Lead teachers in the interpretation, analysis and implementation of National Assessment Program – Literacy and Numeracy (NAPLAN) data. A qualitative case study approach was the chosen methodology, with nine teachers across a variety of school sectors interviewed. Themes and sub-themes were identified from the participants’ interview responses revealing the ways in which Queensland teachers work with NAPLAN data. The data illuminated that generally individual schools and teachers adopted their own ways of working with data, with approaches ranging from individual/ad hoc, to hierarchical or a whole school approach. Findings also revealed that data are the responsibility of various persons from within the school hierarchy; some working with the data electronically whilst others rely on manual manipulation. Manipulation of data is used for various purposes including tracking performance, value adding and targeting programmes for specific groups of students, for example the gifted and talented. Whilst all participants had knowledge of intervention programmes and how practice could be modified, there were large inconsistencies in knowledge and skills across schools. Some see the use of data as a mechanism for accountability, whilst others mention data with regards to changing the school culture and identifying best practice. Overall, the findings showed inconsistencies in approach to focus area 5.4. Recommendations therefore include a more national approach to the use of educational data.
Resumo:
In this chapter, we draw out the relevant themes from a range of critical scholarship from the small body of digital media and software studies work that has focused on the politics of Twitter data and the sociotechnical means by which access is regulated. We highlight in particular the contested relationships between social media research (in both academic and non-academic contexts) and the data wholesale, retail, and analytics industries that feed on them. In the second major section of the chapter we discuss in detail the pragmatic edge of these politics in terms of what kinds of scientific research is and is not possible in the current political economy of Twitter data access. Finally, at the end of the chapter we return to the much broader implications of these issues for the politics of knowledge, demonstrating how the apparently microscopic level of how the Twitter API mediates access to Twitter data actually inscribes and influences the macro level of the global political economy of science itself, through re-inscribing institutional and traditional disciplinary privilege We conclude with some speculations about future developments in data rights and data philanthropy that may at least mitigate some of these negative impacts.
Resumo:
Monitoring the environment with acoustic sensors is an effective method for understanding changes in ecosystems. Through extensive monitoring, large-scale, ecologically relevant, datasets can be produced that can inform environmental policy. The collection of acoustic sensor data is a solved problem; the current challenge is the management and analysis of raw audio data to produce useful datasets for ecologists. This paper presents the applied research we use to analyze big acoustic datasets. Its core contribution is the presentation of practical large-scale acoustic data analysis methodologies. We describe details of the data workflows we use to provide both citizen scientists and researchers practical access to large volumes of ecoacoustic data. Finally, we propose a work in progress large-scale architecture for analysis driven by a hybrid cloud-and-local production-grade website.
Resumo:
This program of research linked police and health data collections to investigate the potential benefits for road safety in terms of enhancing the quality of data. This research has important implications for road safety because, although police collected data has historically underpinned efforts in the area, it is known that many road crashes are not reported to police and that these data lack specific injury severity information. This research shows that data linkage provides a more accurate quantification of the severity and prevalence of road crash injuries which is essential for: prioritising funding; targeting interventions; and estimating the burden and cost of road trauma.
Resumo:
Transit passenger market segmentation enables transit operators to target different classes of transit users for targeted surveys and various operational and strategic planning improvements. However, the existing market segmentation studies in the literature have been generally done using passenger surveys, which have various limitations. The smart card (SC) data from an automated fare collection system facilitate the understanding of the multiday travel pattern of transit passengers and can be used to segment them into identifiable types of similar behaviors and needs. This paper proposes a comprehensive methodology for passenger segmentation solely using SC data. After reconstructing the travel itineraries from SC transactions, this paper adopts the density-based spatial clustering of application with noise (DBSCAN) algorithm to mine the travel pattern of each SC user. An a priori market segmentation approach then segments transit passengers into four identifiable types. The methodology proposed in this paper assists transit operators to understand their passengers and provides them oriented information and services.
Resumo:
High-Order Co-Clustering (HOCC) methods have attracted high attention in recent years because of their ability to cluster multiple types of objects simultaneously using all available information. During the clustering process, HOCC methods exploit object co-occurrence information, i.e., inter-type relationships amongst different types of objects as well as object affinity information, i.e., intra-type relationships amongst the same types of objects. However, it is difficult to learn accurate intra-type relationships in the presence of noise and outliers. Existing HOCC methods consider the p nearest neighbours based on Euclidean distance for the intra-type relationships, which leads to incomplete and inaccurate intra-type relationships. In this paper, we propose a novel HOCC method that incorporates multiple subspace learning with a heterogeneous manifold ensemble to learn complete and accurate intra-type relationships. Multiple subspace learning reconstructs the similarity between any pair of objects that belong to the same subspace. The heterogeneous manifold ensemble is created based on two-types of intra-type relationships learnt using p-nearest-neighbour graph and multiple subspaces learning. Moreover, in order to make sure the robustness of clustering process, we introduce a sparse error matrix into matrix decomposition and develop a novel iterative algorithm. Empirical experiments show that the proposed method achieves improved results over the state-of-art HOCC methods for FScore and NMI.