415 resultados para Event Log Mining
Resumo:
Despite many incidents about fake online consumer reviews have been reported, very few studies have been conducted to date to examine the trustworthiness of online consumer reviews. One of the reasons is the lack of an effective computational method to separate the untruthful reviews (i.e., spam) from the legitimate ones (i.e., ham) given the fact that prominent spam features are often missing in online reviews. The main contribution of our research work is the development of a novel review spam detection method which is underpinned by an unsupervised inferential language modeling framework. Another contribution of this work is the development of a high-order concept association mining method which provides the essential term association knowledge to bootstrap the performance for untruthful review detection. Our experimental results confirm that the proposed inferential language model equipped with high-order concept association knowledge is effective in untruthful review detection when compared with other baseline methods.
Resumo:
Distributed Denial-of-Service (DDoS) attacks continue to be one of the most pernicious threats to the delivery of services over the Internet. Not only are DDoS attacks present in many guises, they are also continuously evolving as new vulnerabilities are exploited. Hence accurate detection of these attacks still remains a challenging problem and a necessity for ensuring high-end network security. An intrinsic challenge in addressing this problem is to effectively distinguish these Denial-of-Service attacks from similar looking Flash Events (FEs) created by legitimate clients. A considerable overlap between the general characteristics of FEs and DDoS attacks makes it difficult to precisely separate these two classes of Internet activity. In this paper we propose parameters which can be used to explicitly distinguish FEs from DDoS attacks and analyse two real-world publicly available datasets to validate our proposal. Our analysis shows that even though FEs appear very similar to DDoS attacks, there are several subtle dissimilarities which can be exploited to separate these two classes of events.
Resumo:
Unusual event detection in crowded scenes remains challenging because of the diversity of events and noise. In this paper, we present a novel approach for unusual event detection via sparse reconstruction of dynamic textures over an overcomplete basis set, with the dynamic texture described by local binary patterns from three orthogonal planes (LBPTOP). The overcomplete basis set is learnt from the training data where only the normal items observed. In the detection process, given a new observation, we compute the sparse coefficients using the Dantzig Selector algorithm which was proposed in the literature of compressed sensing. Then the reconstruction errors are computed, based on which we detect the abnormal items. Our application can be used to detect both local and global abnormal events. We evaluate our algorithm on UCSD Abnormality Datasets for local anomaly detection, which is shown to outperform current state-of-the-art approaches, and we also get promising results for rapid escape detection using the PETS2009 dataset.
Resumo:
This thesis investigates profiling and differentiating customers through the use of statistical data mining techniques. The business application of our work centres on examining individuals’ seldomly studied yet critical consumption behaviour over an extensive time period within the context of the wireless telecommunication industry; consumption behaviour (as oppose to purchasing behaviour) is behaviour that has been performed so frequently that it become habitual and involves minimal intentions or decision making. Key variables investigated are the activity initialised timestamp and cell tower location as well as the activity type and usage quantity (e.g., voice call with duration in seconds); and the research focuses are on customers’ spatial and temporal usage behaviour. The main methodological emphasis is on the development of clustering models based on Gaussian mixture models (GMMs) which are fitted with the use of the recently developed variational Bayesian (VB) method. VB is an efficient deterministic alternative to the popular but computationally demandingMarkov chainMonte Carlo (MCMC) methods. The standard VBGMMalgorithm is extended by allowing component splitting such that it is robust to initial parameter choices and can automatically and efficiently determine the number of components. The new algorithm we propose allows more effective modelling of individuals’ highly heterogeneous and spiky spatial usage behaviour, or more generally human mobility patterns; the term spiky describes data patterns with large areas of low probability mixed with small areas of high probability. Customers are then characterised and segmented based on the fitted GMM which corresponds to how each of them uses the products/services spatially in their daily lives; this is essentially their likely lifestyle and occupational traits. Other significant research contributions include fitting GMMs using VB to circular data i.e., the temporal usage behaviour, and developing clustering algorithms suitable for high dimensional data based on the use of VB-GMM.
Resumo:
In many applications, e.g., bioinformatics, web access traces, system utilisation logs, etc., the data is naturally in the form of sequences. People have taken great interest in analysing the sequential data and finding the inherent characteristics or relationships within the data. Sequential association rule mining is one of the possible methods used to analyse this data. As conventional sequential association rule mining very often generates a huge number of association rules, of which many are redundant, it is desirable to find a solution to get rid of those unnecessary association rules. Because of the complexity and temporal ordered characteristics of sequential data, current research on sequential association rule mining is limited. Although several sequential association rule prediction models using either sequence constraints or temporal constraints have been proposed, none of them considered the redundancy problem in rule mining. The main contribution of this research is to propose a non-redundant association rule mining method based on closed frequent sequences and minimal sequential generators. We also give a definition for the non-redundant sequential rules, which are sequential rules with minimal antecedents but maximal consequents. A new algorithm called CSGM (closed sequential and generator mining) for generating closed sequences and minimal sequential generators is also introduced. A further experiment has been done to compare the performance of generating non-redundant sequential rules and full sequential rules, meanwhile, performance evaluation of our CSGM and other closed sequential pattern mining or generator mining algorithms has also been conducted. We also use generated non-redundant sequential rules for query expansion in order to improve recommendations for infrequently purchased products.
Resumo:
Modelling events in densely crowded environments remains challenging, due to the diversity of events and the noise in the scene. We propose a novel approach for anomalous event detection in crowded scenes using dynamic textures described by the Local Binary Patterns from Three Orthogonal Planes (LBP-TOP) descriptor. The scene is divided into spatio-temporal patches where LBP-TOP based dynamic textures are extracted. We apply hierarchical Bayesian models to detect the patches containing unusual events. Our method is an unsupervised approach, and it does not rely on object tracking or background subtraction. We show that our approach outperforms existing state of the art algorithms for anomalous event detection in UCSD dataset.
Resumo:
INTRODUCTION: Workforce planning for first aid and medical coverage of mass gatherings is hampered by limited research. In particular, the characteristics and likely presentation patterns of low-volume mass gatherings of between several hundred to several thousand people are poorly described in the existing literature. OBJECTIVES: This study was conducted to: 1. Describe key patient and event characteristics of medical presentations at a series of mass gatherings, including events smaller than those previously described in the literature; 2. Determine whether event type and event size affect the mean number of patients presenting for treatment per event, and specifically, whether the 1:2,000 deployment rule used by St John Ambulance Australia is appropriate; and 3. Identify factors that are predictive of injury at mass gatherings. METHODS: A retrospective, observational, case-series design was used to examine all cases treated by two Divisions of St John Ambulance (Queensland) in the greater metropolitan Brisbane region over a three-year period (01 January 2002-31 December 2004). Data were obtained from routinely collected patient treatment forms completed by St John officers at the time of treatment. Event-related data (e.g., weather, event size) were obtained from event forms designed for this study. Outcome measures include: total and average number of patient presentations for each event; event type; and event size category. Descriptive analyses were conducted using chi-square tests, and mean presentations per event and event type were investigated using Kruskal-Wallis tests. Logistic regression analyses were used to identify variables independently associated with injury presentation (compared with non-injury presentations). RESULTS: Over the three-year study period, St John Ambulance officers treated 705 patients over 156 separate events. The mean number of patients who presented with any medical condition at small events (less than or equal to 2,000 attendees) did not differ significantly from that of large (>2,000 attendees) events (4.44 vs. 4.67, F = 0.72, df = 1, 154, p = 0.79). Logistic regression analyses indicated that presentation with an injury compared with non-injury was independently associated with male gender, winter season, and sporting events, even after adjusting for relevant variables. CONCLUSIONS: In this study of low-volume mass gatherings, a similar number of patients sought medical treatment at small (<2,000 patrons) and large (>2,000 patrons) events. This demonstrates that for low-volume mass gatherings, planning based solely on anticipated event size may be flawed, and could lead to inappropriate levels of first-aid coverage. This study also highlights the importance of considering other factors, such as event type and patient characteristics, when determining appropriate first-aid resourcing for low-volume events. Additionally, identification of factors predictive of injury presentations at mass gatherings has the potential to significantly enhance the ability of event coordinators to plan effective prevention strategies and response capability for these events.
Resumo:
Current approaches to the regulation of coal mining activities in Australia have facilitated the extraction of substantial amounts of coal and coal seam gas. The regulation of coal mining activities must now achieve the reduction or mitigation of greenhouse gas emissions in order to address the challenge of climate change and achieve ecologically sustainable development. Several legislative mechanisms currently exist which appear to offer the means to bring about the reduction or mitigation of greenhouse gas emissions from coal mining activities, yet Australia’s emissions from coal mining continue to rise. This article critiques these existing legislative mechanisms and presents recommendations for reform.
Resumo:
Road asset managers are overwhelmed with a high volume of raw data which they need to process and utilise in supporting their decision making. This paper presents a method that processes road-crash data of a whole road network and exposes hidden value inherent in the data by deploying the clustering data mining method. The goal of the method is to partition the road network into a set of groups (classes) based on common data and characterise the class crash types to produce a crash profiles for each cluster. By comparing similar road classes with differing crash types and rates, insight can be gained into these differences that are caused by the particular characteristics of their roads. These differences can be used as evidence in knowledge development and decision support.
Resumo:
Most recommendation methods employ item-item similarity measures or use ratings data to generate recommendations. These methods use traditional two dimensional models to find inter relationships between alike users and products. This paper proposes a novel recommendation method using the multi-dimensional model, tensor, to group similar users based on common search behaviour, and then finding associations within such groups for making effective inter group recommendations. Web log data is multi-dimensional data. Unlike vector based methods, tensors have the ability to highly correlate and find latent relationships between such similar instances, consisting of users and searches. Non redundant rules from such associations of user-searches are then used for making recommendations to the users.
Resumo:
We propose to use the Tensor Space Modeling (TSM) to represent and analyze the user’s web log data that consists of multiple interests and spans across multiple dimensions. Further we propose to use the decomposition factors of the Tensors for clustering the users based on similarity of search behaviour. Preliminary results show that the proposed method outperforms the traditional Vector Space Model (VSM) based clustering.