254 resultados para Data Mining, Rough Sets, Multi-Dimension, Association Rules, Constraint


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent years, the Web 2.0 has provided considerable facilities for people to create, share and exchange information and ideas. Upon this, the user generated content, such as reviews, has exploded. Such data provide a rich source to exploit in order to identify the information associated with specific reviewed items. Opinion mining has been widely used to identify the significant features of items (e.g., cameras) based upon user reviews. Feature extraction is the most critical step to identify useful information from texts. Most existing approaches only find individual features about a product without revealing the structural relationships between the features which usually exist. In this paper, we propose an approach to extract features and feature relationships, represented as a tree structure called feature taxonomy, based on frequent patterns and associations between patterns derived from user reviews. The generated feature taxonomy profiles the product at multiple levels and provides more detailed information about the product. Our experiment results based on some popularly used review datasets show that our proposed approach is able to capture the product features and relations effectively.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Modern health information systems can generate several exabytes of patient data, the so called "Health Big Data", per year. Many health managers and experts believe that with the data, it is possible to easily discover useful knowledge to improve health policies, increase patient safety and eliminate redundancies and unnecessary costs. The objective of this paper is to discuss the characteristics of Health Big Data as well as the challenges and solutions for health Big Data Analytics (BDA) – the process of extracting knowledge from sets of Health Big Data – and to design and evaluate a pipelined framework for use as a guideline/reference in health BDA.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Person re-identification is particularly challenging due to significant appearance changes across separate camera views. In order to re-identify people, a representative human signature should effectively handle differences in illumination, pose and camera parameters. While general appearance-based methods are modelled in Euclidean spaces, it has been argued that some applications in image and video analysis are better modelled via non-Euclidean manifold geometry. To this end, recent approaches represent images as covariance matrices, and interpret such matrices as points on Riemannian manifolds. As direct classification on such manifolds can be difficult, in this paper we propose to represent each manifold point as a vector of similarities to class representers, via a recently introduced form of Bregman matrix divergence known as the Stein divergence. This is followed by using a discriminative mapping of similarity vectors for final classification. The use of similarity vectors is in contrast to the traditional approach of embedding manifolds into tangent spaces, which can suffer from representing the manifold structure inaccurately. Comparative evaluations on benchmark ETHZ and iLIDS datasets for the person re-identification task show that the proposed approach obtains better performance than recent techniques such as Histogram Plus Epitome, Partial Least Squares, and Symmetry-Driven Accumulation of Local Features.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Existing multi-model approaches for image set classification extract local models by clustering each image set individually only once, with fixed clusters used for matching with other image sets. However, this may result in the two closest clusters to represent different characteristics of an object, due to different undesirable environmental conditions (such as variations in illumination and pose). To address this problem, we propose to constrain the clustering of each query image set by forcing the clusters to have resemblance to the clusters in the gallery image sets. We first define a Frobenius norm distance between subspaces over Grassmann manifolds based on reconstruction error. We then extract local linear subspaces from a gallery image set via sparse representation. For each local linear subspace, we adaptively construct the corresponding closest subspace from the samples of a probe image set by joint sparse representation. We show that by minimising the sparse representation reconstruction error, we approach the nearest point on a Grassmann manifold. Experiments on Honda, ETH-80 and Cambridge-Gesture datasets show that the proposed method consistently outperforms several other recent techniques, such as Affine Hull based Image Set Distance (AHISD), Sparse Approximated Nearest Points (SANP) and Manifold Discriminant Analysis (MDA).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper evaluates the suitability of sequence classification techniques for analyzing deviant business process executions based on event logs. Deviant process executions are those that deviate in a negative or positive way with respect to normative or desirable outcomes, such as non-compliant executions or executions that undershoot or exceed performance targets. We evaluate a range of feature types and classification methods in terms of their ability to accurately discriminate between normal and deviant executions both when deviances are infrequent (unbalanced) and when deviances are as frequent as normal executions (balanced). We also analyze the ability of the discovered rules to explain potential causes and contributing factors of observed deviances. The evaluation results show that feature types extracted using pattern mining techniques only slightly outperform those based on individual activity frequency. The results also suggest that more complex feature types ought to be explored to achieve higher levels of accuracy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Early transcriptional activation events that occur in bladder immediately following bacterial urinary tract infection (UTI) are not well defined. In this study, we describe the whole bladder transcriptome of uropathogenic Escherichia coli (UPEC) cystitis in mice using genome-wide expression profiling to define the transcriptome of innate immune activation stemming from UPEC colonization of the bladder. Bladder RNA from female C57BL/6 mice, analyzed using 1.0 ST-Affymetrix microarrays, revealed extensive activation of diverse sets of innate immune response genes, including those that encode multiple IL-family members, receptors, metabolic regulators, MAPK activators, and lymphocyte signaling molecules. These were among 1564 genes differentially regulated at 2 h postinfection, highlighting a rapid and broad innate immune response to bladder colonization. Integrative systems-level analyses using InnateDB (http://www.innatedb.com) bioinformatics and ingenuity pathway analysis identified multiple distinct biological pathways in the bladder transcriptome with extensive involvement of lymphocyte signaling, cell cycle alterations, cytoskeletal, and metabolic changes. A key regulator of IL activity identified in the transcriptome was IL-10, which was analyzed functionally to reveal marked exacerbation of cystitis in IL-10–deficient mice. Studies of clinical UTI revealed significantly elevated urinary IL-10 in patients with UPEC cystitis, indicating a role for IL-10 in the innate response to human UTI. The whole bladder transcriptome presented in this work provides new insight into the diversity of innate factors that determine UTI on a genome-wide scale and will be valuable for further data mining. Identification of protective roles for other elements in the transcriptome will provide critical new insight into the complex cascade of events that underpin UTI.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Due to the popularity of security cameras in public places, it is of interest to design an intelligent system that can efficiently detect events automatically. This paper proposes a novel algorithm for multi-person event detection. To ensure greater than real-time performance, features are extracted directly from compressed MPEG video. A novel histogram-based feature descriptor that captures the angles between extracted particle trajectories is proposed, which allows us to capture motion patterns of multi-person events in the video. To alleviate the need for fine-grained annotation, we propose the use of Labelled Latent Dirichlet Allocation, a “weakly supervised” method that allows the use of coarse temporal annotations which are much simpler to obtain. This novel system is able to run at approximately ten times real-time, while preserving state-of-theart detection performance for multi-person events on a 100-hour real-world surveillance dataset (TRECVid SED).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Although the collection of player and ball tracking data is fast becoming the norm in professional sports, large-scale mining of such spatiotemporal data has yet to surface. In this paper, given an entire season's worth of player and ball tracking data from a professional soccer league (approx 400,000,000 data points), we present a method which can conduct both individual player and team analysis. Due to the dynamic, continuous and multi-player nature of team sports like soccer, a major issue is aligning player positions over time. We present a "role-based" representation that dynamically updates each player's relative role at each frame and demonstrate how this captures the short-term context to enable both individual player and team analysis. We discover role directly from data by utilizing a minimum entropy data partitioning method and show how this can be used to accurately detect and visualize formations, as well as analyze individual player behavior.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Acoustic recordings play an increasingly important role in monitoring terrestrial and aquatic environments. However, rapid advances in technology make it possible to accumulate thousands of hours of recordings, more than ecologists can ever listen to. Our approach to this big-data challenge is to visualize the content of long-duration audio recordings on multiple scales, from minutes, hours, days to years. The visualization should facilitate navigation and yield ecologically meaningful information prior to listening to the audio. To construct images, we calculate acoustic indices, statistics that describe the distribution of acoustic energy and reflect content of ecological interest. We combine various indices to produce false-color spectrogram images that reveal acoustic content and facilitate navigation. The technical challenge we investigate in this work is how to navigate recordings that are days or even months in duration. We introduce a method of zooming through multiple temporal scales, analogous to Google Maps. However, the “landscape” to be navigated is not geographical and not therefore intrinsically visual, but rather a graphical representation of the underlying audio. We describe solutions to navigating spectrograms that range over three orders of magnitude of temporal scale. We make three sets of observations: 1. We determine that at least ten intermediate scale steps are required to zoom over three orders of magnitude of temporal scale; 2. We determine that three different visual representations are required to cover the range of temporal scales; 3. We present a solution to the problem of maintaining visual continuity when stepping between different visual representations. Finally, we demonstrate the utility of the approach with four case studies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This research studied distributed computing of all-to-all comparison problems with big data sets. The thesis formalised the problem, and developed a high-performance and scalable computing framework with a programming model, data distribution strategies and task scheduling policies to solve the problem. The study considered storage usage, data locality and load balancing for performance improvement in solving the problem. The research outcomes can be applied in bioinformatics, biometrics and data mining and other domains in which all-to-all comparisons are a typical computing pattern.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

User generated information such as product reviews have been booming due to the advent of web 2.0. In particular, rich information associated with reviewed products has been buried in such big data. In order to facilitate identifying useful information from product (e.g., cameras) reviews, opinion mining has been proposed and widely used in recent years. In detail, as the most critical step of opinion mining, feature extraction aims to extract significant product features from review texts. However, most existing approaches only find individual features rather than identifying the hierarchical relationships between the product features. In this paper, we propose an approach which finds both features and feature relationships, structured as a feature hierarchy which is referred to as feature taxonomy in the remainder of the paper. Specifically, by making use of frequent patterns and association rules, we construct the feature taxonomy to profile the product at multiple levels instead of single level, which provides more detailed information about the product. The experiment which has been conducted based upon some real world review datasets shows that our proposed method is capable of identifying product features and relations effectively.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Acoustics is a rich source of environmental information that can reflect the ecological dynamics. To deal with the escalating acoustic data, a variety of automated classification techniques have been used for acoustic patterns or scene recognition, including urban soundscapes such as streets and restaurants; and natural soundscapes such as raining and thundering. It is common to classify acoustic patterns under the assumption that a single type of soundscapes present in an audio clip. This assumption is reasonable for some carefully selected audios. However, only few experiments have been focused on classifying simultaneous acoustic patterns in long-duration recordings. This paper proposes a binary relevance based multi-label classification approach to recognise simultaneous acoustic patterns in one-minute audio clips. By utilising acoustic indices as global features and multilayer perceptron as a base classifier, we achieve good classification performance on in-the-field data. Compared with single-label classification, multi-label classification approach provides more detailed information about the distributions of various acoustic patterns in long-duration recordings. These results will merit further biodiversity investigations, such as bird species surveys.