360 resultados para Multimedia Data Mining
Resumo:
A people-to-people matching system (or a match-making system) refers to a system in which users join with the objective of meeting other users with the common need. Some real-world examples of these systems are employer-employee (in job search networks), mentor-student (in university social networks), consume-to-consumer (in marketplaces) and male-female (in an online dating network). The network underlying in these systems consists of two groups of users, and the relationships between users need to be captured for developing an efficient match-making system. Most of the existing studies utilize information either about each of the users in isolation or their interaction separately, and develop recommender systems using the one form of information only. It is imperative to understand the linkages among the users in the network and use them in developing a match-making system. This study utilizes several social network analysis methods such as graph theory, small world phenomenon, centrality analysis, density analysis to gain insight into the entities and their relationships present in this network. This paper also proposes a new type of graph called “attributed bipartite graph”. By using these analyses and the proposed type of graph, an efficient hybrid recommender system is developed which generates recommendation for new users as well as shows improvement in accuracy over the baseline methods.
Resumo:
This paper elaborates the approach used by the Applied Data Mining Research Group (ADMRG) for the Social Event Detection (SED) Tasks of the 2013 MediaEval Benchmark. We extended the constrained clustering algorithm to apply to the first semi-supervised clustering task, and we compared several classifiers with Latent Dirichlet Allocation as feature selector in the second event classification task. The proposed approach focuses on scalability and efficient memory allocation when applied to a high dimensional data with large clusters. Results of the first task show the effectiveness of the proposed method. Results from task 2 indicate that attention on the imbalance categories distributions is needed.
Resumo:
Trees are capable of portraying the semi-structured data which is common in web domain. Finding similarities between trees is mandatory for several applications that deal with semi-structured data. Existing similarity methods examine a pair of trees by comparing through nodes and paths of two trees, and find the similarity between them. However, these methods provide unfavorable results for unordered tree data and result in yielding NP-hard or MAX-SNP hard complexity. In this paper, we present a novel method that encodes a tree with an optimal traversing approach first, and then, utilizes it to model the tree with its equivalent matrix representation for finding similarity between unordered trees efficiently. Empirical analysis shows that the proposed method is able to achieve high accuracy even on the large data sets.
Resumo:
Modern health information systems can generate several exabytes of patient data, the so called "Health Big Data", per year. Many health managers and experts believe that with the data, it is possible to easily discover useful knowledge to improve health policies, increase patient safety and eliminate redundancies and unnecessary costs. The objective of this paper is to discuss the characteristics of Health Big Data as well as the challenges and solutions for health Big Data Analytics (BDA) – the process of extracting knowledge from sets of Health Big Data – and to design and evaluate a pipelined framework for use as a guideline/reference in health BDA.
Resumo:
Road networks are a national critical infrastructure. The road assets need to be monitored and maintained efficiently as their conditions deteriorate over time. The condition of one of such assets, road pavement, plays a major role in the road network maintenance programmes. Pavement conditions depend upon many factors such as pavement types, traffic and environmental conditions. This paper presents a data analytics case study for assessing the factors affecting the pavement deflection values measured by the traffic speed deflectometer (TSD) device. The analytics process includes acquisition and integration of data from multiple sources, data pre-processing, mining useful information from them and utilising data mining outputs for knowledge deployment. Data mining techniques are able to show how TSD outputs vary in different roads, traffic and environmental conditions. The generated data mining models map the TSD outputs to some classes and define correction factors for each class.
Resumo:
Term-based approaches can extract many features in text documents, but most include noise. Many popular text-mining strategies have been adapted to reduce noisy information from extracted features; however, text-mining techniques suffer from low frequency. The key issue is how to discover relevance features in text documents to fulfil user information needs. To address this issue, we propose a new method to extract specific features from user relevance feedback. The proposed approach includes two stages. The first stage extracts topics (or patterns) from text documents to focus on interesting topics. In the second stage, topics are deployed to lower level terms to address the low-frequency problem and find specific terms. The specific terms are determined based on their appearances in relevance feedback and their distribution in topics or high-level patterns. We test our proposed method with extensive experiments in the Reuters Corpus Volume 1 dataset and TREC topics. Results show that our proposed approach significantly outperforms the state-of-the-art models.
Resumo:
Mobile devices are rapidly developing into the primary technology for users to work, socialize, and play in a variety of settings and contexts. Their pervasiveness has provided researchers with the means to investigate innovative solutions to ever more complex user demands. Tools for Mobile Multimedia Programming and Development investigates the use of mobile platforms for research projects, focusing on the development, testing, and evaluation of prototypes rather than final products, which enables researchers to better understand the needs of users through image processing, object recognition, sensor integration, and user interactions. This book benefits researchers and professionals in multiple disciplines who utilize such techniques in the creation of prototypes for mobile devices and applications. This book is part of the Advances in Wireless Technologies and Telecommunication series collection.
Resumo:
This study was a step forward to improve the performance for discovering useful knowledge – especially, association rules in this study – in databases. The thesis proposed an approach to use granules instead of patterns to represent knowledge implicitly contained in relational databases; and multi-tier structure to interpret association rules in terms of granules. Association mappings were proposed for the construction of multi-tier structure. With these tools, association rules can be quickly assessed and meaningless association rules can be justified according to the association mappings. The experimental results indicated that the proposed approach is promising.
Resumo:
Variations that exist in the treatment of patients (with similar symptoms) across different hospitals do substantially impact the quality and costs of healthcare. Consequently, it is important to understand the similarities and differences between the practices across different hospitals. This paper presents a case study on the application of process mining techniques to measure and quantify the differences in the treatment of patients presenting with chest pain symptoms across four South Australian hospitals. Our case study focuses on cross-organisational benchmarking of processes and their performance. Techniques such as clustering, process discovery, performance analysis, and scientific workflows were applied to facilitate such comparative analyses. Lessons learned in overcoming unique challenges in cross-organisational process mining, such as ensuring population comparability, data granularity comparability, and experimental repeatability are also presented.
Resumo:
This paper evaluates the suitability of sequence classification techniques for analyzing deviant business process executions based on event logs. Deviant process executions are those that deviate in a negative or positive way with respect to normative or desirable outcomes, such as non-compliant executions or executions that undershoot or exceed performance targets. We evaluate a range of feature types and classification methods in terms of their ability to accurately discriminate between normal and deviant executions both when deviances are infrequent (unbalanced) and when deviances are as frequent as normal executions (balanced). We also analyze the ability of the discovered rules to explain potential causes and contributing factors of observed deviances. The evaluation results show that feature types extracted using pattern mining techniques only slightly outperform those based on individual activity frequency. The results also suggest that more complex feature types ought to be explored to achieve higher levels of accuracy.
Resumo:
Although the collection of player and ball tracking data is fast becoming the norm in professional sports, large-scale mining of such spatiotemporal data has yet to surface. In this paper, given an entire season's worth of player and ball tracking data from a professional soccer league (approx 400,000,000 data points), we present a method which can conduct both individual player and team analysis. Due to the dynamic, continuous and multi-player nature of team sports like soccer, a major issue is aligning player positions over time. We present a "role-based" representation that dynamically updates each player's relative role at each frame and demonstrate how this captures the short-term context to enable both individual player and team analysis. We discover role directly from data by utilizing a minimum entropy data partitioning method and show how this can be used to accurately detect and visualize formations, as well as analyze individual player behavior.
Resumo:
To the trained-eye, experts can often identify a team based on their unique style of play due to their movement, passing and interactions. In this paper, we present a method which can accurately determine the identity of a team from spatiotemporal player tracking data. We do this by utilizing a formation descriptor which is found by minimizing the entropy of role-specific occupancy maps. We show how our approach is significantly better at identifying different teams compared to standard measures (i.e., shots, passes etc.). We demonstrate the utility of our approach using an entire season of Prozone player tracking data from a top-tier professional soccer league.
Resumo:
This thesis presents an association rule mining approach, association hierarchy mining (AHM). Different to the traditional two-step bottom-up rule mining, AHM adopts one-step top-down rule mining strategy to improve the efficiency and effectiveness of mining association rules from datasets. The thesis also presents a novel approach to evaluate the quality of knowledge discovered by AHM, which focuses on evaluating information difference between the discovered knowledge and the original datasets. Experiments performed on the real application, characterizing network traffic behaviour, have shown that AHM achieves encouraging performance.
Resumo:
Extracting frequent subtrees from the tree structured data has important applications in Web mining. In this paper, we introduce a novel canonical form for rooted labelled unordered trees called the balanced-optimal-search canonical form (BOCF) that can handle the isomorphism problem efficiently. Using BOCF, we define a tree structure guided scheme based enumeration approach that systematically enumerates only the valid subtrees. Finally, we present the balanced optimal search tree miner (BOSTER) algorithm based on BOCF and the proposed enumeration approach, for finding frequent induced subtrees from a database of labelled rooted unordered trees. Experiments on the real datasets compare the efficiency of BOSTER over the two state-of-the-art algorithms for mining induced unordered subtrees, HybridTreeMiner and UNI3. The results are encouraging.