826 resultados para Data mining models
Resumo:
In the field of process mining, the use of event logs for the purpose of root cause analysis is increasingly studied. In such an analysis, the availability of attributes/features that may explain the root cause of some phenomena is crucial. Currently, the process of obtaining these attributes from raw event logs is performed more or less on a case-by-case basis: there is still a lack of generalized systematic approach that captures this process. This paper proposes a systematic approach to enrich and transform event logs in order to obtain the required attributes for root cause analysis using classical data mining techniques, the classification techniques. This approach is formalized and its applicability has been validated using both self-generated and publicly-available logs.
Resumo:
Data quality has become a major concern for organisations. The rapid growth in the size and technology of a databases and data warehouses has brought significant advantages in accessing, storing, and retrieving information. At the same time, great challenges arise with rapid data throughput and heterogeneous accesses in terms of maintaining high data quality. Yet, despite the importance of data quality, literature has usually condensed data quality into detecting and correcting poor data such as outliers, incomplete or inaccurate values. As a result, organisations are unable to efficiently and effectively assess data quality. Having an accurate and proper data quality assessment method will enable users to benchmark their systems and monitor their improvement. This paper introduces a granules mining for measuring the random degree of error data which will enable decision makers to conduct accurate quality assessment and allocate the most severe data, thereby providing an accurate estimation of human and financial resources for conducting quality improvement tasks.
Resumo:
Monitoring the natural environment is increasingly important as habit degradation and climate change reduce theworld’s biodiversity.We have developed software tools and applications to assist ecologists with the collection and analysis of acoustic data at large spatial and temporal scales.One of our key objectives is automated animal call recognition, and our approach has three novel attributes. First, we work with raw environmental audio, contaminated by noise and artefacts and containing calls that vary greatly in volume depending on the animal’s proximity to the microphone. Second, initial experimentation suggested that no single recognizer could dealwith the enormous variety of calls. Therefore, we developed a toolbox of generic recognizers to extract invariant features for each call type. Third, many species are cryptic and offer little data with which to train a recognizer. Many popular machine learning methods require large volumes of training and validation data and considerable time and expertise to prepare. Consequently we adopt bootstrap techniques that can be initiated with little data and refined subsequently. In this paper, we describe our recognition tools and present results for real ecological problems.
Resumo:
IT-supported field data management benefits on-site construction management by improving accessibility to the information and promoting efficient communication between project team members. However, most of on-site safety inspections still heavily rely on subjective judgment and manual reporting processes and thus observers’ experiences often determine the quality of risk identification and control. This study aims to develop a methodology to efficiently retrieve safety-related information so that the safety inspectors can easily access to the relevant site safety information for safer decision making. The proposed methodology consists of three stages: (1) development of a comprehensive safety database which contains information of risk factors, accident types, impact of accidents and safety regulations; (2) identification of relationships among different risk factors based on statistical analysis methods; and (3) user-specified information retrieval using data mining techniques for safety management. This paper presents an overall methodology and preliminary results of the first stage research conducted with 101 accident investigation reports.
Resumo:
Audio-visualspeechrecognition, or the combination of visual lip-reading with traditional acoustic speechrecognition, has been previously shown to provide a considerable improvement over acoustic-only approaches in noisy environments, such as that present in an automotive cabin. The research presented in this paper will extend upon the established audio-visualspeechrecognition literature to show that further improvements in speechrecognition accuracy can be obtained when multiple frontal or near-frontal views of a speaker's face are available. A series of visualspeechrecognition experiments using a four-stream visual synchronous hidden Markov model (SHMM) are conducted on the four-camera AVICAR automotiveaudio-visualspeech database. We study the relative contribution between the side and central orientated cameras in improving visualspeechrecognition accuracy. Finally combination of the four visual streams with a single audio stream in a five-stream SHMM demonstrates a relative improvement of over 56% in word recognition accuracy when compared to the acoustic-only approach in the noisiest conditions of the AVICAR database.
Resumo:
Divergence from a random baseline is a technique for the evaluation of document clustering. It ensures cluster quality measures are performing work that prevents ineffective clusterings from giving high scores to clusterings that provide no useful result. These concepts are defined and analysed using intrinsic and extrinsic approaches to the evaluation of document cluster quality. This includes the classical clusters to categories approach and a novel approach that uses ad hoc information retrieval. The divergence from a random baseline approach is able to differentiate ineffective clusterings encountered in the INEX XML Mining track. It also appears to perform a normalisation similar to the Normalised Mutual Information (NMI) measure but it can be applied to any measure of cluster quality. When it is applied to the intrinsic measure of distortion as measured by RMSE, subtraction from a random baseline provides a clear optimum that is not apparent otherwise. This approach can be applied to any clustering evaluation. This paper describes its use in the context of document clustering evaluation.
Resumo:
Breast cancer is a leading contributor to the burden of disease in Australia. Fortunately, the recent introduction of diverse therapeutic strategies have improved the survival outcome for many women. Despite this, the clinical management of breast cancer remains problematic as not all approaches are sufficiently sophisticated to take into account the heterogeneity of this disease and are unable to predict disease progression, in particular, metastasis. As such, women with good prognostic outcomes are exposed to the side effects of therapies without added benefit. Furthermore, women with aggressive disease for whom these advanced treatments would deliver benefit cannot be distinguished and opportunities for more intensive or novel treatment are lost. This study is designed to identify novel factors associated with disease progression, and the potential to inform disease prognosis. Frequently overlooked, yet common mediators of disease are the interactions that take place between the insulin-like growth factor (IGF) system and the extracellular matrix (ECM). Our laboratory has previously demonstrated that multiprotein insulin-like growth factor-I (IGF-I): insulin-like growth factor binding protein (IGFBP): vitronectin (VN) complexes stimulate migration of breast cancer cells in vitro, via the cooperative involvement of the insulin-like growth factor type I receptor (IGF-IR) and VN-binding integrins. However, the effects of IGF and ECM protein interactions on the dissemination and progression of breast cancer in vivo are unknown. It was hypothesised that interactions between proteins required for IGF induced signalling events and those within the ECM contribute to breast cancer metastasis and are prognostic and predictive indicators of patient outcome. To address this hypothesis, semiquantitative immunohistochemistry (IHC) analyses were performed to compare the extracellular and subcellular distribution of IGF and ECM induced signalling proteins between matched normal, primary cancer, and metastatic cancer among archival formalin-fixed paraffin-embedded (FFPE) breast tissue samples collected from women attending the Princess Alexandra Hospital, Brisbane. Multivariate Cox proportional hazards (PH) regression survival models in conjunction with a modified „purposeful selection of covariates. method were applied to determine the prognostic potential of these proteins. This study provides the first in-depth, compartmentalised analysis of the distribution of IGF and ECM induced signalling proteins. As protein function and protein localisation are closely correlated, these findings provide novel insights into IGF signalling and ECM protein function during breast cancer development and progression. Distinct IGF signalling and ECM protein immunoreactivity was observed in the stroma and/or in subcellular locations in normal breast, primary cancer and metastatic cancer tissues. Analysis of the presence and location of stratifin (SFN) suggested a causal relationship in ECM remodelling events during breast cancer development and progression. The results of this study have also suggested that fibronectin (FN) and ¥â1 integrin are important for the formation of invadopodia and epithelial-to-mesenchymal transition (EMT) events. Our data also highlighted the importance of the temporal and spatial distribution of IGF induced signalling proteins in breast cancer metastasis; in particular, SFN, enhancer-of-split and hairy-related protein 2 (SHARP-2), total-akt/protein kinase B 1 (Total-AKT1), phosphorylated-akt/protein kinase B (P-AKT), extracellular signal-related kinase-1 and extracellular signal-related kinase-2 (ERK1/2) and phosphorylated-extracellular signal-related kinase-1 and extracellular signal-related kinase-2 (P-ERK1/2). Multivariate survival models were created from the immunohistochemical data. These models were found to fit well with these data with very high statistical confidence. Numerous prognostic confounding effects and effect modifications were identified among elements of the ECM and IGF signalling cascade and corroborate the survival models. This finding provides further evidence for the prognostic potential of IGF and ECM induced signalling proteins. In addition, the adjusted measures of associations obtained in this study have strengthened the validity and utility of the resulting models. The findings from this study provide insights into the biological interactions that occur during the development of breast tissue and contribute to disease progression. Importantly, these multivariate survival models could provide important prognostic and predictive indicators that assist the clinical management of breast disease, namely in the early identification of cancers with a propensity to metastasise, and/or recur following adjuvant therapy. The outcomes of this study further inform the development of new therapeutics to aid patient recovery. The findings from this study have widespread clinical application in the diagnosis of disease and prognosis of disease progression, and inform the most appropriate clinical management of individuals with breast cancer.
Resumo:
This technical report describes the methods used to obtain a list of acoustic indices that are used to characterise the structure and distribution of acoustic energy in recordings of the natural environment. In particular it describes methods for noise reduction from recordings of the environment and a fast clustering algorithm used to estimate the spectral richness of long recordings.
Resumo:
IT-supported field data management benefits on-site construction management by improving accessibility to the information and promoting efficient communication between project team members. However, most of on-site safety inspections still heavily rely on subjective judgment and manual reporting processes and thus observers’ experiences often determine the quality of risk identification and control. This study aims to develop a methodology to efficiently retrieve safety-related information so that the safety inspectors can easily access to the relevant site safety information for safer decision making. The proposed methodology consists of three stages: (1) development of a comprehensive safety database which contains information of risk factors, accident types, impact of accidents and safety regulations; (2) identification of relationships among different risk factors based on statistical analysis methods; and (3) user-specified information retrieval using data mining techniques for safety management. This paper presents an overall methodology and preliminary results of the first stage research conducted with 101 accident investigation reports.
Resumo:
Emergence has the potential to effect complex, creative or open-ended interactions and novel game-play. We report on research into an emergent interactive system. This investigates emergent user behaviors and experience through the creation and evaluation of an interactive system. The system is +-NOW, an augmented reality, tangible, interactive art system. The paper briefly describes the qualities of emergence and +-NOW before focusing on its evaluation. This was a qualitative study with 30 participants conducted in context. Data analysis followed Grounded Theory Methods. Coding schemes, induced from data and external literature are presented. Findings show that emergence occurred in over half of the participants. The nature of these emergent behaviors is discussed along with examples from the data. Other findings indicate that participants found interaction with the work satisfactory. Design strategies for facilitating satisfactory experience despite the often unpredictable character of emergence, are briefly reviewed and potential application areas for emergence are discussed.
Resumo:
The interactive art system +-now is modelled on the openness of the natural world. Emergent shapes constitute a novel method for facilitating this openness. With the art system as an example, the relationship between openness and emergence is discussed. Lastly, artist reflections from the creation of the work are presented. These describe the nature of open systems and how they may be created.
Resumo:
Glass Pond is an interactive artwork designed to engender exploration and reflection through an intuitive, tangible interface and a simulation agent. It is being developed using iterative methods. A study has been conducted with the aim of illuminating user experience, interface, design, and performance issues.The paper describes the study methodology and process of data analysis including coding schemes for cognitive states and movements. Analysis reveals that exploration and reflection occurred as well as composing behaviours (unexpected). Results also show that participants interacted to varying degrees. Design discussion includes the artwork's (novel) interface and configuration.
Resumo:
Emergence is discussed in the context of a practice-based study of interactive art and a new taxonomy of emergence is proposed. The interactive art system ‘plus minus now’ is described and its relationship to emergence is discussed. ‘Plus minus now’ uses a novel method for instantiating emergent shapes. A preliminary investigation of this art system has been conducted and reveals the creation of temporal compositions by a participant. These temporal compositions and the emergent shapes are described using the taxonomy of emergence. Characteristics of emergent interactions and the implications of designing for them are discussed.
Resumo:
This thesis is concerned with creating and evaluating interactive art systems that facilitate emergent participant experiences. For the purposes of this research, interactive art is the computer based arts involving physical participation from the audience, while emergence is when a new form or concept appears that was not directly implied by the context from which it arose. This emergent ‘whole’ is more than a simple sum of its parts. The research aims to develop understanding of the nature of emergent experiences that might arise during participant interaction with interactive art systems. It also aims to understand the design issues surrounding the creation of these systems. The approach used is Practice-based, integrating practice, evaluation and theoretical research. Practice used methods from Reflection-in-action and Iterative design to create two interactive art systems: Glass Pond and +-now. Creation of +-now resulted in a novel method for instantiating emergent shapes. Both art works were also evaluated in exploratory studies. In addition, a main study with 30 participants was conducted on participant interaction with +-now. These sessions were video recorded and participants were interviewed about their experience. Recordings were transcribed and analysed using Grounded theory methods. Emergent participant experiences were identified and classified using a taxonomy of emergence in interactive art. This taxonomy draws on theoretical research. The outcomes of this Practice-based research are summarised as follows. Two interactive art systems, where the second work clearly facilitates emergent interaction, were created. Their creation involved the development of a novel method for instantiating emergent shapes and it informed aesthetic and design issues surrounding interactive art systems for emergence. A taxonomy of emergence in interactive art was also created. Other outcomes are the evaluation findings about participant experiences, including different types of emergence experienced and the coding schemes produced during data analysis.