866 resultados para mining closure
Resumo:
Despite many incidents about fake online consumer reviews have been reported, very few studies have been conducted to date to examine the trustworthiness of online consumer reviews. One of the reasons is the lack of an effective computational method to separate the untruthful reviews (i.e., spam) from the legitimate ones (i.e., ham) given the fact that prominent spam features are often missing in online reviews. The main contribution of our research work is the development of a novel review spam detection method which is underpinned by an unsupervised inferential language modeling framework. Another contribution of this work is the development of a high-order concept association mining method which provides the essential term association knowledge to bootstrap the performance for untruthful review detection. Our experimental results confirm that the proposed inferential language model equipped with high-order concept association knowledge is effective in untruthful review detection when compared with other baseline methods.
Resumo:
This thesis investigates profiling and differentiating customers through the use of statistical data mining techniques. The business application of our work centres on examining individuals’ seldomly studied yet critical consumption behaviour over an extensive time period within the context of the wireless telecommunication industry; consumption behaviour (as oppose to purchasing behaviour) is behaviour that has been performed so frequently that it become habitual and involves minimal intentions or decision making. Key variables investigated are the activity initialised timestamp and cell tower location as well as the activity type and usage quantity (e.g., voice call with duration in seconds); and the research focuses are on customers’ spatial and temporal usage behaviour. The main methodological emphasis is on the development of clustering models based on Gaussian mixture models (GMMs) which are fitted with the use of the recently developed variational Bayesian (VB) method. VB is an efficient deterministic alternative to the popular but computationally demandingMarkov chainMonte Carlo (MCMC) methods. The standard VBGMMalgorithm is extended by allowing component splitting such that it is robust to initial parameter choices and can automatically and efficiently determine the number of components. The new algorithm we propose allows more effective modelling of individuals’ highly heterogeneous and spiky spatial usage behaviour, or more generally human mobility patterns; the term spiky describes data patterns with large areas of low probability mixed with small areas of high probability. Customers are then characterised and segmented based on the fitted GMM which corresponds to how each of them uses the products/services spatially in their daily lives; this is essentially their likely lifestyle and occupational traits. Other significant research contributions include fitting GMMs using VB to circular data i.e., the temporal usage behaviour, and developing clustering algorithms suitable for high dimensional data based on the use of VB-GMM.
Resumo:
In many applications, e.g., bioinformatics, web access traces, system utilisation logs, etc., the data is naturally in the form of sequences. People have taken great interest in analysing the sequential data and finding the inherent characteristics or relationships within the data. Sequential association rule mining is one of the possible methods used to analyse this data. As conventional sequential association rule mining very often generates a huge number of association rules, of which many are redundant, it is desirable to find a solution to get rid of those unnecessary association rules. Because of the complexity and temporal ordered characteristics of sequential data, current research on sequential association rule mining is limited. Although several sequential association rule prediction models using either sequence constraints or temporal constraints have been proposed, none of them considered the redundancy problem in rule mining. The main contribution of this research is to propose a non-redundant association rule mining method based on closed frequent sequences and minimal sequential generators. We also give a definition for the non-redundant sequential rules, which are sequential rules with minimal antecedents but maximal consequents. A new algorithm called CSGM (closed sequential and generator mining) for generating closed sequences and minimal sequential generators is also introduced. A further experiment has been done to compare the performance of generating non-redundant sequential rules and full sequential rules, meanwhile, performance evaluation of our CSGM and other closed sequential pattern mining or generator mining algorithms has also been conducted. We also use generated non-redundant sequential rules for query expansion in order to improve recommendations for infrequently purchased products.
Resumo:
Current approaches to the regulation of coal mining activities in Australia have facilitated the extraction of substantial amounts of coal and coal seam gas. The regulation of coal mining activities must now achieve the reduction or mitigation of greenhouse gas emissions in order to address the challenge of climate change and achieve ecologically sustainable development. Several legislative mechanisms currently exist which appear to offer the means to bring about the reduction or mitigation of greenhouse gas emissions from coal mining activities, yet Australia’s emissions from coal mining continue to rise. This article critiques these existing legislative mechanisms and presents recommendations for reform.
Resumo:
Road asset managers are overwhelmed with a high volume of raw data which they need to process and utilise in supporting their decision making. This paper presents a method that processes road-crash data of a whole road network and exposes hidden value inherent in the data by deploying the clustering data mining method. The goal of the method is to partition the road network into a set of groups (classes) based on common data and characterise the class crash types to produce a crash profiles for each cluster. By comparing similar road classes with differing crash types and rates, insight can be gained into these differences that are caused by the particular characteristics of their roads. These differences can be used as evidence in knowledge development and decision support.
Resumo:
Process mining techniques are able to extract knowledge from event logs commonly available in today’s information systems. These techniques provide new means to discover, monitor, and improve processes in a variety of application domains. There are two main drivers for the growing interest in process mining. On the one hand, more and more events are being recorded, thus, providing detailed information about the history of processes. On the other hand, there is a need to improve and support business processes in competitive and rapidly changing environments. This manifesto is created by the IEEE Task Force on Process Mining and aims to promote the topic of process mining. Moreover, by defining a set of guiding principles and listing important challenges, this manifesto hopes to serve as a guide for software developers, scientists, consultants, business managers, and end-users. The goal is to increase the maturity of process mining as a new tool to improve the (re)design, control, and support of operational business processes.
Resumo:
Fibroblasts and their activated phenotype, myofibroblasts, are the primary cell types involved in the contraction associated with dermal wound healing. Recent experimental evidence indicates that the transformation from fibroblasts to myofibroblasts involves two distinct processes: the cells are stimulated to change phenotype by the combined actions of transforming growth factor β (TGFβ) and mechanical tension. This observation indicates a need for a detailed exploration of the effect of the strong interactions between the mechanical changes and growth factors in dermal wound healing. We review the experimental findings in detail and develop a model of dermal wound healing that incorporates these phenomena. Our model includes the interactions between TGFβ and collagenase, providing a more biologically realistic form for the growth factor kinetics than those included in previous mechanochemical descriptions. A comparison is made between the model predictions and experimental data on human dermal wound healing and all the essential features are well matched.
Resumo:
With the increasing number of XML documents in varied domains, it has become essential to identify ways of finding interesting information from these documents. Data mining techniques were used to derive this interesting information. Mining on XML documents is impacted by its model due to the semi-structured nature of these documents. Hence, in this chapter we present an overview of the various models of XML documents, how these models were used for mining and some of the issues and challenges in these models. In addition, this chapter also provides some insights into the future models of XML documents for effectively capturing the two important features namely structure and content of XML documents for mining.