47 resultados para Extração semi-automática


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Network traffic classification is an essential component for network management and security systems. To address the limitations of traditional port-based and payload-based methods, recent studies have been focusing on alternative approaches. One promising direction is applying machine learning techniques to classify traffic flows based on packet and flow level statistics. In particular, previous papers have illustrated that clustering can achieve high accuracy and discover unknown application classes. In this work, we present a novel semi-supervised learning method using constrained clustering algorithms. The motivation is that in network domain a lot of background information is available in addition to the data instances themselves. For example, we might know that flow ƒ1 and ƒ2 are using the same application protocol because they are visiting the same host address at the same port simultaneously. In this case, ƒ1 and ƒ2 shall be grouped into the same cluster ideally. Therefore, we describe these correlations in the form of pair-wise must-link constraints and incorporate them in the process of clustering. We have applied three constrained variants of the K-Means algorithm, which perform hard or soft constraint satisfaction and metric learning from constraints. A number of real-world traffic traces have been used to show the availability of constraints and to test the proposed approach. The experimental results indicate that by incorporating constraints in the course of clustering, the overall accuracy and cluster purity can be significantly improved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The ability to learn and recognize human activities of daily living (ADLs) is important in building pervasive and smart environments. In this paper, we tackle this problem using the hidden semi-Markov model. We discuss the state-of-the-art duration modeling choices and then address a large class of exponential family distributions to model state durations. Inference and learning are efficiently addressed by providing a graphical representation for the model in terms of a dynamic Bayesian network (DBN). We investigate both discrete and continuous distributions from the exponential family (Poisson and Inverse Gaussian respectively) for the problem of learning and recognizing ADLs. A full comparison between the exponential family duration models and other existing models including the traditional multinomial and the new Coxian are also presented. Our work thus completes a thorough investigation into the aspect of duration modeling and its application to human activities recognition in a real-world smart home surveillance scenario.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper addresses the problem of learning and recognizing human activities of daily living (ADL), which is an important research issue in building a pervasive and smart environment. In dealing with ADL, we argue that it is beneficial to exploit both the inherent hierarchical organization of the activities and their typical duration. To this end, we introduce the Switching Hidden Semi-Markov Model (S-HSMM), a two-layered extension of the hidden semi-Markov model (HSMM) for the modeling task. Activities are modeled in the S-HSMM in two ways: the bottom layer represents atomic activities and their duration using HSMMs; the top layer represents a sequence of high-level activities where each high-level activity is made of a sequence of atomic activities. We consider two methods for modeling duration: the classic explicit duration model using multinomial distribution, and the novel use of the discrete Coxian distribution. In addition, we propose an effective scheme to detect abnormality without the need for training on abnormal data. Experimental results show that the S-HSMM performs better than existing models including the flat HSMM and the hierarchical hidden Markov model in both classification and abnormality detection tasks, alleviating the need for presegmented training data. Furthermore, our discrete Coxian duration model yields better computation time and generalization error than the classic explicit duration model.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Inspired by the hierarchical hidden Markov models (HHMM), we present the hierarchical semi-Markov conditional random field (HSCRF), a generalisation of embedded undirected Markov chains to model complex hierarchical, nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we develop efficient algorithms for learning and constrained inference in a partially-supervised setting, which is important issue in practice where labels can only be obtained sparsely. We demonstrate the HSCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. We show that the HSCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we introduce a probabilistic framework to exploit hierarchy, structure sharing and duration information for topic transition detection in videos. Our probabilistic detection framework is a combination of a shot classification step and a detection phase using hierarchical probabilistic models. We consider two models in this paper: the extended Hierarchical Hidden Markov Model (HHMM) and the Coxian Switching Hidden semi-Markov Model (S-HSMM) because they allow the natural decomposition of semantics in videos, including shared structures, to be modeled directly, and thus enabling efficient inference and reducing the sample complexity in learning. Additionally, the S-HSMM allows the duration information to be incorporated, consequently the modeling of long-term dependencies in videos is enriched through both hierarchical and duration modeling. Furthermore, the use of the Coxian distribution in the S-HSMM makes it tractable to deal with long sequences in video. Our experimentation of the proposed framework on twelve educational and training videos shows that both models outperform the baseline cases (flat HMM and HSMM) and performances reported in earlier work in topic detection. The superior performance of the S-HSMM over the HHMM verifies our belief that duration information is an important factor in video content modeling.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we exploit the discrete Coxian distribution and propose a novel form of stochastic model, termed as the Coxian hidden semi-Makov model (Cox-HSMM), and apply it to the task of recognising activities of daily living (ADLs) in a smart house environment. The use of the Coxian has several advantages over traditional parameterization (e.g. multinomial or continuous distributions) including the low number of free parameters needed, its computational efficiency, and the existing of closed-form solution. To further enrich the model in real-world applications, we also address the problem of handling missing observation for the proposed Cox-HSMM. In the domain of ADLs, we emphasize the importance of the duration information and model it via the Cox-HSMM. Our experimental results have shown the superiority of the Cox-HSMM in all cases when compared with the standard HMM. Our results have further shown that outstanding recognition accuracy can be achieved with relatively low number of phases required in the Coxian, thus making the Cox-HSMM particularly suitable in recognizing ADLs whose movement trajectories are typically very long in nature.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Both the instance level knowledge and the attribute level knowledge can improve clustering quality, but how to effectively utilize both of them is an essential problem to solve. This paper proposes a wrapper framework for semi-supervised clustering, which aims to gracely integrate both kinds of priori knowledge in the clustering process, the instance level knowledge in the form of pairwise constraints and the attribute level knowledge in the form of attribute order preferences. The wrapped algorithm is then designed as a semi-supervised clustering process which transforms this clustering problem into an optimization problem. The experimental results demonstrate the effectiveness and potential of proposed method.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a new semi-supervised method to effectively improve traffic classification performance when few supervised training data are available. Existing semi supervised methods label a large proportion of testing flows as unknown flows due to limited supervised information, which severely affects the classification performance. To address this problem, we propose to incorporate flow correlation into both training and testing stages. At the training stage, we make use of flow correlation to extend the supervised data set by automatically labeling unlabeled flows according to their correlation to the pre-labeled flows. Consequently, the traffic classifier has better performance due to the extended size and quality of the supervised data sets. At the testing stage, the correlated flows are identified and classified jointly by combining their individual predictions, so as to further boost the classification accuracy. The empirical study on the real-world network traffic shows that the proposed method outperforms the state-of-the-art flow statistical feature based classification methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In current constraint-based (Pearl-style) systems for discovering Bayesian networks, inputs with deterministic relations are prohibited. This restricts the applicability of these systems. In this paper, we formalize a sufficient condition under which Bayesian networks can be recovered even with deterministic relations. The sufficient condition leads to an improvement to Pearl’s IC algorithm; other constraint-based algorithms can be similarly improved. The new algorithm, assuming the sufficient condition proposed, is able to recover Bayesian networks with deterministic relations, and moreover suffers no loss of performance when applied to nondeterministic Bayesian networks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A key task in ecology is to understand the drivers of animal distributions. In arid and semi-arid environments, this is challenging because animal populations show considerable spatial and temporal variation. An effective approach in such systems is to examine both broad-scale and long-term data. We used this approach to investigate the distribution of small mammal species in semi-arid ‘mallee’ vegetation in south-eastern Australia. First, we examined broad-scale data collected at 280 sites across the Murray Mallee region. We used generalized additive mixed models (GAMMs) to examine four hypotheses concerning factors that influence the distribution of individual mammal species at this scale: vegetation structure, floristic diversity, topography and recent rainfall. Second, we used long-term data from a single conservation reserve (surveyed from 1997 to 2012) to examine small mammal responses to rainfall over a period spanning a broad range of climatic conditions, including record high rainfall in 2011. Small mammal distributions were strongly associated with vegetation structure and rainfall patterns, but the relative importance of these drivers was species-specific. The distribution of the mallee ningaui Ningaui yvonneae, for example, was largely determined by the cover of hummock grass; whereas the occurrence of the western pygmy possum Cercartetus concinnus was most strongly associated with above-average rainfall. Further, the combination of both broad-scale and long-term data provided valuable insights. Bolam's mouse Pseudomys bolami was uncommon during the broad-scale survey, but long-term surveys showed that it responds positively to above-average rainfall. Conceptual models developed for small mammals in temperate and central arid Australia, respectively, were not, on their own, adequate to account for the distributional patterns of species in this semi-arid ecosystem. Species-specific variation in the relative importance of different drivers was more effectively explained by qualitative differences in life-history attributes among species.