874 resultados para Jardine Mining


Relevância:

20.00% 20.00%

Publicador:

Resumo:

It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of the large number of terms, patterns, and noise. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern-based methods should perform better than term-based ones in describing user preferences, but many experiments do not support this hypothesis. The innovative technique presented in paper makes a breakthrough for this difficulty. This technique discovers both positive and negative patterns in text documents as higher level features in order to accurately weight low-level features (terms) based on their specificity and their distributions in the higher level features. Substantial experiments using this technique on Reuters Corpus Volume 1 and TREC topics show that the proposed approach significantly outperforms both the state-of-the-art term-based methods underpinned by Okapi BM25, Rocchio or Support Vector Machine and pattern based methods on precision, recall and F measures.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a novel two-stage information filtering model which combines the merits of term-based and pattern- based approaches to effectively filter sheer volume of information. In particular, the first filtering stage is supported by a novel rough analysis model which efficiently removes a large number of irrelevant documents, thereby addressing the overload problem. The second filtering stage is empowered by a semantically rich pattern taxonomy mining model which effectively fetches incoming documents according to the specific information needs of a user, thereby addressing the mismatch problem. The experiments have been conducted to compare the proposed two-stage filtering (T-SM) model with other possible "term-based + pattern-based" or "term-based + term-based" IF models. The results based on the RCV1 corpus show that the T-SM model significantly outperforms other types of "two-stage" IF models.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents an automated image‐based safety assessment method for earthmoving and surface mining activities. The literature review revealed the possible causes of accidents on earthmoving operations, investigated the spatial risk factors of these types of accident, and identified spatial data needs for automated safety assessment based on current safety regulations. Image‐based data collection devices and algorithms for safety assessment were then evaluated. Analysis methods and rules for monitoring safety violations were also discussed. The experimental results showed that the safety assessment method collected spatial data using stereo vision cameras, applied object identification and tracking algorithms, and finally utilized identified and tracked object information for safety decision making.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Despite many incidents about fake online consumer reviews have been reported, very few studies have been conducted to date to examine the trustworthiness of online consumer reviews. One of the reasons is the lack of an effective computational method to separate the untruthful reviews (i.e., spam) from the legitimate ones (i.e., ham) given the fact that prominent spam features are often missing in online reviews. The main contribution of our research work is the development of a novel review spam detection method which is underpinned by an unsupervised inferential language modeling framework. Another contribution of this work is the development of a high-order concept association mining method which provides the essential term association knowledge to bootstrap the performance for untruthful review detection. Our experimental results confirm that the proposed inferential language model equipped with high-order concept association knowledge is effective in untruthful review detection when compared with other baseline methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis investigates profiling and differentiating customers through the use of statistical data mining techniques. The business application of our work centres on examining individuals’ seldomly studied yet critical consumption behaviour over an extensive time period within the context of the wireless telecommunication industry; consumption behaviour (as oppose to purchasing behaviour) is behaviour that has been performed so frequently that it become habitual and involves minimal intentions or decision making. Key variables investigated are the activity initialised timestamp and cell tower location as well as the activity type and usage quantity (e.g., voice call with duration in seconds); and the research focuses are on customers’ spatial and temporal usage behaviour. The main methodological emphasis is on the development of clustering models based on Gaussian mixture models (GMMs) which are fitted with the use of the recently developed variational Bayesian (VB) method. VB is an efficient deterministic alternative to the popular but computationally demandingMarkov chainMonte Carlo (MCMC) methods. The standard VBGMMalgorithm is extended by allowing component splitting such that it is robust to initial parameter choices and can automatically and efficiently determine the number of components. The new algorithm we propose allows more effective modelling of individuals’ highly heterogeneous and spiky spatial usage behaviour, or more generally human mobility patterns; the term spiky describes data patterns with large areas of low probability mixed with small areas of high probability. Customers are then characterised and segmented based on the fitted GMM which corresponds to how each of them uses the products/services spatially in their daily lives; this is essentially their likely lifestyle and occupational traits. Other significant research contributions include fitting GMMs using VB to circular data i.e., the temporal usage behaviour, and developing clustering algorithms suitable for high dimensional data based on the use of VB-GMM.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In many applications, e.g., bioinformatics, web access traces, system utilisation logs, etc., the data is naturally in the form of sequences. People have taken great interest in analysing the sequential data and finding the inherent characteristics or relationships within the data. Sequential association rule mining is one of the possible methods used to analyse this data. As conventional sequential association rule mining very often generates a huge number of association rules, of which many are redundant, it is desirable to find a solution to get rid of those unnecessary association rules. Because of the complexity and temporal ordered characteristics of sequential data, current research on sequential association rule mining is limited. Although several sequential association rule prediction models using either sequence constraints or temporal constraints have been proposed, none of them considered the redundancy problem in rule mining. The main contribution of this research is to propose a non-redundant association rule mining method based on closed frequent sequences and minimal sequential generators. We also give a definition for the non-redundant sequential rules, which are sequential rules with minimal antecedents but maximal consequents. A new algorithm called CSGM (closed sequential and generator mining) for generating closed sequences and minimal sequential generators is also introduced. A further experiment has been done to compare the performance of generating non-redundant sequential rules and full sequential rules, meanwhile, performance evaluation of our CSGM and other closed sequential pattern mining or generator mining algorithms has also been conducted. We also use generated non-redundant sequential rules for query expansion in order to improve recommendations for infrequently purchased products.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Current approaches to the regulation of coal mining activities in Australia have facilitated the extraction of substantial amounts of coal and coal seam gas. The regulation of coal mining activities must now achieve the reduction or mitigation of greenhouse gas emissions in order to address the challenge of climate change and achieve ecologically sustainable development. Several legislative mechanisms currently exist which appear to offer the means to bring about the reduction or mitigation of greenhouse gas emissions from coal mining activities, yet Australia’s emissions from coal mining continue to rise. This article critiques these existing legislative mechanisms and presents recommendations for reform.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Road asset managers are overwhelmed with a high volume of raw data which they need to process and utilise in supporting their decision making. This paper presents a method that processes road-crash data of a whole road network and exposes hidden value inherent in the data by deploying the clustering data mining method. The goal of the method is to partition the road network into a set of groups (classes) based on common data and characterise the class crash types to produce a crash profiles for each cluster. By comparing similar road classes with differing crash types and rates, insight can be gained into these differences that are caused by the particular characteristics of their roads. These differences can be used as evidence in knowledge development and decision support.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Process mining techniques are able to extract knowledge from event logs commonly available in today’s information systems. These techniques provide new means to discover, monitor, and improve processes in a variety of application domains. There are two main drivers for the growing interest in process mining. On the one hand, more and more events are being recorded, thus, providing detailed information about the history of processes. On the other hand, there is a need to improve and support business processes in competitive and rapidly changing environments. This manifesto is created by the IEEE Task Force on Process Mining and aims to promote the topic of process mining. Moreover, by defining a set of guiding principles and listing important challenges, this manifesto hopes to serve as a guide for software developers, scientists, consultants, business managers, and end-users. The goal is to increase the maturity of process mining as a new tool to improve the (re)design, control, and support of operational business processes.