77 resultados para Information Filtering, Pattern Mining, Relevance Feature Discovery, Text Mining

em Deakin Research Online - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Video event detection is an effective way to automatically understand the semantic content of the video. However, due to the mismatch between low-level visual features and high-level semantics, the research of video event detection encounters a number of challenges, such as how to extract the suitable information from video, how to represent the event, how to build up reasoning mechanism to infer the event according to video information. In this paper, we propose a novel event detection method. The method detects the video event based on the semantic trajectory, which is a high-level semantic description of the moving object’s trajectory in the video. The proposed method consists of three phases to transform low-level visual features to middle-level raw trajectory information and then to high-level semantic trajectory information. Event reasoning is then carried out with the assistance of semantic trajectory information and background knowledge. Additionally, to release the users’ burden in manual event definition, a method is further proposed to automatically discover the event-related semantic trajectory pattern from the sample semantic trajectories. Furthermore, in order to effectively use the discovered semantic trajectory patterns, the associative classification-based event detection framework is adopted to discover the possibly occurred event. Empirical studies show our methods can effectively and efficiently detect video events.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The thesis has researched a set of critical problems in data mining and has proposed four advanced pattern mining algorithm to discover the most interesting and useful data patterns highly relevant to the user’s application targets from the data is represented in complex structures.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Hotel managers continue to find ways to understand traveler preferences, with the aim of improving their strategic planning, marketing, and product development. Traveler preference is unpredictable for example, hotel guests used to prefer having a telephone in the room, but now favor fast Internet connection. Changes in preference influence the performance of hotel businesses, thus creating the need to identify and address the demands of their guests. Most existing studies focus on current demand attributes and not on emerging ones. Thus, hotel managers may find it difficult to make appropriate decisions in response to changes in travelers' concerns. To address these challenges, this paper adopts Emerging Pattern Mining technique to identify emergent hotel features of interest to international travelers. Data are derived from 118,000 records of online reviews. The methods and findings can help hotel managers gain insights into travelers' interests, enabling the former to gain a better understanding of the rapid changes in tourist preferences.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we propose Incremental Sequential PAttern Discovery using Equivalence classes (IncSPADE) algorithm to mine the dynamic database without the requirement of re-scanning the database again. In order to evaluate this algorithm, we conducted the experiments against three different artificial datasets. The result shows that IncSPADE outperformed the benchmarked algorithm called SPADE up to 20%.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we discuss a special case of knowledge creation via pattern mining that was studied using a hermeneutic approach. The reported study explores the nature of knowledge creation by domain practitioners who do not communicate directly. The focus of this paper extends the traditional view of a knowledge creation process beyond organisational boundaries. The proposed knowledge creation framework explains the facilitated process of knowledge creation by its qualification, combination, socialisation, externalisation, internalisation and introspection, thus allowing the transformation of individual experience and knowledge into formalised shareable domain knowledge.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we propose a model for discovering frequent sequential patterns, phrases, which can be used as profile descriptors of documents. It is indubitable that we can obtain numerous phrases using data mining algorithms. However, it is difficult to use these phrases effectively for answering what users want. Therefore, we present a pattern taxonomy extraction model which performs the task of extracting descriptive frequent sequential patterns by pruning the meaningless ones. The model then is extended and tested by applying it to the information filtering system. The results of the experiment show that pattern-based methods outperform the keyword-based methods. The results also indicate that removal of meaningless patterns not only reduces the cost of computation but also improves the effectiveness of the system.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis proposes three effective strategies to solve the significant performance-bias problem in imbalance text mining: (1) creation of a novel inexact field learning algorithm to overcome the dual-imbalance problem; (2) introduction of the one-class classification-framework to optimize classifier-parameters, and (3) proposal of a maximal-frequent-item-set discovery approach to achieve higher accuracy and efficiency.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Information portals are seen as an appropriate platform for personalised healthcare and wellbeing information provision. Efficient content management is a core capability of a successful smart health information portal (SHIP) and domain expertise is a vital input to content management when it comes to matching user profiles with the appropriate resources. The rate of generation of new health-related content far exceeds the numbers that can be manually examined by domain experts for relevance to a specific topic and audience. In this paper we investigate automated content discovery as a plausible solution to this shortcoming that capitalises on the existing database of expert-endorsed content as an implicit store of knowledge to guide such a solution. We propose a novel content discovery technique based on a text analytics approach that utilises an existing content repository to acquire new and relevant content. We also highlight the contribution of this technique towards realisation of smart content management for SHIPs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Spam is commonly defined as unsolicited email messages and the goal of spam filtering is to differentiate spam from legitimate email. Much work have been done to filter spam from legitimate emails using machine learning algorithm and substantial performance has been achieved with some amount of false positive (FP) tradeoffs. In this paper, architecture of spam filtering has been proposed based on support vector machine (SVM,) which will get better accuracy by reducing FP problems. In this architecture an innovative technique for feature selection called dynamic feature selection (DFS) has been proposed which is enhanced the overall performance of the architecture with reduction of FP problems. The experimental result shows that the proposed technique gives better performance compare to similar existing techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As one of the primary substances in a living organism, protein defines the character of each cell by interacting with the cellular environment to promote the cell’s growth and function [1]. Previous studies on proteomics indicate that the functions of different proteins could be assigned based upon protein structures [2,3]. The knowledge on protein structures gives us an overview of protein fold space and is helpful for the understanding of the evolutionary principles behind structure. By observing the architectures and topologies of the protein families, biological processes can be investigated more directly with much higher resolution and finer detail. For this reason, the analysis of protein, its structure and the interaction with the other materials is emerging as an important problem in bioinformatics. However, the determination of protein structures is experimentally expensive and time consuming, this makes scientists largely dependent on sequence rather than more general structure to infer the function of the protein at the present time. For this reason, data mining technology is introduced into this area to provide more efficient data processing and knowledge discovery approaches.

Unlike many data mining applications which lack available data, the protein structure determination problem and its interaction study, on the contrary, could utilize a vast amount of biologically relevant information on protein and its interaction, such as the protein data bank (PDB) [4], the structural classification of proteins (SCOP) databases [5], CATH databases [6], UniProt [7], and others. The difficulty of predicting protein structures, specially its 3D structures, and the interactions between proteins as shown in Figure 6.1, lies in the computational complexity of the data. Although a large number of approaches have been developed to determine the protein structures such as ab initio modelling [8], homology modelling [9] and threading [10], more efficient and reliable methods are still greatly needed.

In this chapter, we will introduce a state-of-the-art data mining technique, graph mining, which is good at defining and discovering interesting structural patterns in graphical data sets, and take advantage of its expressive power to study protein structures, including protein structure prediction and comparison, and protein-protein interaction (PPI). The current graph pattern mining methods will be described, and typical algorithms will be presented, together with their applications in the protein structure analysis.

The rest of the chapter is organized as follows: Section 6.2 will give a brief introduction of the fundamental knowledge of protein, the publicly accessible protein data resources and the current research status of protein analysis; in Section 6.3, we will pay attention to one of the state-of-the-art data mining methods, graph mining; then Section 6.4 surveys several existing work for protein structure analysis using advanced graph mining methods in the recent decade; finally, in Section 6.5, a conclusion with potential further work will be summarized.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Case studies of the organizational implementation of traditional business computing have often emphasized the importance of context in research design and data analysis. The emergence of computing phenomena that pervade different contexts within and even beyond the organizational boundary suggests the need to disaggregate the notion of context to allow for finer levels of contextual analysis. Indeed we demonstrate that a failure to consider interdependent levels of context in organizational case studies of computing technologies that even approach ubiquity runs the risk of partial and even incorrect conclusions being drawn. We illustrate this argument by means of two explanatory case studies of intranet and mobile technology implementation in organizations. Based on the extant literature on context in case study design and examples drawn from the cases, we propose a range of interconnected and interrelated contexts to consider in the research design of explanatory cases of ubiquitous technology implementation in organizations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Discovering frequent patterns plays an essential role in many data mining applications. The aim of frequent patterns is to obtain the information about the most common patterns that appeared together. However, designing an efficient model to mine these patterns is still demanding due to the capacity of current database size. Therefore, we propose an Efficient Frequent Pattern Mining Model (EFP-M2) to mine the frequent patterns in timely manner. The result shows that the algorithm in EFP-M2l is outperformed at least at 2 orders of magnitudes against the benchmarked FP-Growth.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Indirect pattern is considered as valuable and hidden information in transactional database. It represents the property of high dependencies between two items that are rarely occurred together but indirectly appeared via another items. Indirect pattern mining is very important because it can reveal a new knowledge in certain domain applications. Therefore, we propose an Indirect Pattern Mining Algorithm (IPMA) in an attempt to mine the indirect patterns from data repository. IPMA embeds with a measure called Critical Relative Support (CRS) measure rather than the common interesting measures. The result shows that IPMA is successful in generating the indirect patterns with the various threshold values.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present a large-scale mood analysis in social media texts. We organise the paper in three parts: (1) addressing the problem of feature selection and classification of mood in blogosphere, (2) we extract global mood patterns at different level of aggregation from a large-scale data set of approximately 18 millions documents (3) and finally, we extract mood trajectory for an egocentric user and study how it can be used to detect subtle emotion signals in a user-centric manner, supporting discovery of hyper-groups of communities based on sentiment information. For mood classification, two feature sets proposed in psychology are used, showing that these features are efficient, do not require a training phase and yield classification results comparable to state of the art, supervised feature selection schemes, on mood patterns, empirical results for mood organisation in the blogosphere are provided, analogous to the structure of human emotion proposed independently in the psychology literature, and on community structure discovery, sentiment-based approach can yield useful insights into community formation.