375 resultados para MULTI-RELATIONAL DATA MINING
Resumo:
The ability to build high-fidelity 3D representations of the environment from sensor data is critical for autonomous robots. Multi-sensor data fusion allows for more complete and accurate representations. Furthermore, using distinct sensing modalities (i.e. sensors using a different physical process and/or operating at different electromagnetic frequencies) usually leads to more reliable perception, especially in challenging environments, as modalities may complement each other. However, they may react differently to certain materials or environmental conditions, leading to catastrophic fusion. In this paper, we propose a new method to reliably fuse data from multiple sensing modalities, including in situations where they detect different targets. We first compute distinct continuous surface representations for each sensing modality, with uncertainty, using Gaussian Process Implicit Surfaces (GPIS). Second, we perform a local consistency test between these representations, to separate consistent data (i.e. data corresponding to the detection of the same target by the sensors) from inconsistent data. The consistent data can then be fused together, using another GPIS process, and the rest of the data can be combined as appropriate. The approach is first validated using synthetic data. We then demonstrate its benefit using a mobile robot, equipped with a laser scanner and a radar, which operates in an outdoor environment in the presence of large clouds of airborne dust and smoke.
Resumo:
Variations that exist in the treatment of patients (with similar symptoms) across different hospitals do substantially impact the quality and costs of healthcare. Consequently, it is important to understand the similarities and differences between the practices across different hospitals. This paper presents a case study on the application of process mining techniques to measure and quantify the differences in the treatment of patients presenting with chest pain symptoms across four South Australian hospitals. Our case study focuses on cross-organisational benchmarking of processes and their performance. Techniques such as clustering, process discovery, performance analysis, and scientific workflows were applied to facilitate such comparative analyses. Lessons learned in overcoming unique challenges in cross-organisational process mining, such as ensuring population comparability, data granularity comparability, and experimental repeatability are also presented.
Resumo:
This paper evaluates the suitability of sequence classification techniques for analyzing deviant business process executions based on event logs. Deviant process executions are those that deviate in a negative or positive way with respect to normative or desirable outcomes, such as non-compliant executions or executions that undershoot or exceed performance targets. We evaluate a range of feature types and classification methods in terms of their ability to accurately discriminate between normal and deviant executions both when deviances are infrequent (unbalanced) and when deviances are as frequent as normal executions (balanced). We also analyze the ability of the discovered rules to explain potential causes and contributing factors of observed deviances. The evaluation results show that feature types extracted using pattern mining techniques only slightly outperform those based on individual activity frequency. The results also suggest that more complex feature types ought to be explored to achieve higher levels of accuracy.
Resumo:
To the trained-eye, experts can often identify a team based on their unique style of play due to their movement, passing and interactions. In this paper, we present a method which can accurately determine the identity of a team from spatiotemporal player tracking data. We do this by utilizing a formation descriptor which is found by minimizing the entropy of role-specific occupancy maps. We show how our approach is significantly better at identifying different teams compared to standard measures (i.e., shots, passes etc.). We demonstrate the utility of our approach using an entire season of Prozone player tracking data from a top-tier professional soccer league.
Resumo:
This thesis presents an association rule mining approach, association hierarchy mining (AHM). Different to the traditional two-step bottom-up rule mining, AHM adopts one-step top-down rule mining strategy to improve the efficiency and effectiveness of mining association rules from datasets. The thesis also presents a novel approach to evaluate the quality of knowledge discovered by AHM, which focuses on evaluating information difference between the discovered knowledge and the original datasets. Experiments performed on the real application, characterizing network traffic behaviour, have shown that AHM achieves encouraging performance.
Resumo:
Extracting frequent subtrees from the tree structured data has important applications in Web mining. In this paper, we introduce a novel canonical form for rooted labelled unordered trees called the balanced-optimal-search canonical form (BOCF) that can handle the isomorphism problem efficiently. Using BOCF, we define a tree structure guided scheme based enumeration approach that systematically enumerates only the valid subtrees. Finally, we present the balanced optimal search tree miner (BOSTER) algorithm based on BOCF and the proposed enumeration approach, for finding frequent induced subtrees from a database of labelled rooted unordered trees. Experiments on the real datasets compare the efficiency of BOSTER over the two state-of-the-art algorithms for mining induced unordered subtrees, HybridTreeMiner and UNI3. The results are encouraging.
Resumo:
This paper presents an algorithm for mining unordered embedded subtrees using the balanced-optimal-search canonical form (BOCF). A tree structure guided scheme based enumeration approach is defined using BOCF for systematically enumerating the valid subtrees only. Based on this canonical form and enumeration technique, the balanced optimal search embedded subtree mining algorithm (BEST) is introduced for mining embedded subtrees from a database of labelled rooted unordered trees. The extensive experiments on both synthetic and real datasets demonstrate the efficiency of BEST over the two state-of-the-art algorithms for mining embedded unordered subtrees, SLEUTH and U3.
Resumo:
Interpolation techniques for spatial data have been applied frequently in various fields of geosciences. Although most conventional interpolation methods assume that it is sufficient to use first- and second-order statistics to characterize random fields, researchers have now realized that these methods cannot always provide reliable interpolation results, since geological and environmental phenomena tend to be very complex, presenting non-Gaussian distribution and/or non-linear inter-variable relationship. This paper proposes a new approach to the interpolation of spatial data, which can be applied with great flexibility. Suitable cross-variable higher-order spatial statistics are developed to measure the spatial relationship between the random variable at an unsampled location and those in its neighbourhood. Given the computed cross-variable higher-order spatial statistics, the conditional probability density function (CPDF) is approximated via polynomial expansions, which is then utilized to determine the interpolated value at the unsampled location as an expectation. In addition, the uncertainty associated with the interpolation is quantified by constructing prediction intervals of interpolated values. The proposed method is applied to a mineral deposit dataset, and the results demonstrate that it outperforms kriging methods in uncertainty quantification. The introduction of the cross-variable higher-order spatial statistics noticeably improves the quality of the interpolation since it enriches the information that can be extracted from the observed data, and this benefit is substantial when working with data that are sparse or have non-trivial dependence structures.
Resumo:
A tag-based item recommendation method generates an ordered list of items, likely interesting to a particular user, using the users past tagging behaviour. However, the users tagging behaviour varies in different tagging systems. A potential problem in generating quality recommendation is how to build user profiles, that interprets user behaviour to be effectively used, in recommendation models. Generally, the recommendation methods are made to work with specific types of user profiles, and may not work well with different datasets. In this paper, we investigate several tagging data interpretation and representation schemes that can lead to building an effective user profile. We discuss the various benefits a scheme brings to a recommendation method by highlighting the representative features of user tagging behaviours on a specific dataset. Empirical analysis shows that each interpretation scheme forms a distinct data representation which eventually affects the recommendation result. Results on various datasets show that an interpretation scheme should be selected based on the dominant usage in the tagging data (i.e. either higher amount of tags or higher amount of items present). The usage represents the characteristic of user tagging behaviour in the system. The results also demonstrate how the scheme is able to address the cold-start user problem.