875 resultados para Mining
Resumo:
Despite many incidents about fake online consumer reviews have been reported, very few studies have been conducted to date to examine the trustworthiness of online consumer reviews. One of the reasons is the lack of an effective computational method to separate the untruthful reviews (i.e., spam) from the legitimate ones (i.e., ham) given the fact that prominent spam features are often missing in online reviews. The main contribution of our research work is the development of a novel review spam detection method which is underpinned by an unsupervised inferential language modeling framework. Another contribution of this work is the development of a high-order concept association mining method which provides the essential term association knowledge to bootstrap the performance for untruthful review detection. Our experimental results confirm that the proposed inferential language model equipped with high-order concept association knowledge is effective in untruthful review detection when compared with other baseline methods.
Resumo:
This thesis investigates profiling and differentiating customers through the use of statistical data mining techniques. The business application of our work centres on examining individuals’ seldomly studied yet critical consumption behaviour over an extensive time period within the context of the wireless telecommunication industry; consumption behaviour (as oppose to purchasing behaviour) is behaviour that has been performed so frequently that it become habitual and involves minimal intentions or decision making. Key variables investigated are the activity initialised timestamp and cell tower location as well as the activity type and usage quantity (e.g., voice call with duration in seconds); and the research focuses are on customers’ spatial and temporal usage behaviour. The main methodological emphasis is on the development of clustering models based on Gaussian mixture models (GMMs) which are fitted with the use of the recently developed variational Bayesian (VB) method. VB is an efficient deterministic alternative to the popular but computationally demandingMarkov chainMonte Carlo (MCMC) methods. The standard VBGMMalgorithm is extended by allowing component splitting such that it is robust to initial parameter choices and can automatically and efficiently determine the number of components. The new algorithm we propose allows more effective modelling of individuals’ highly heterogeneous and spiky spatial usage behaviour, or more generally human mobility patterns; the term spiky describes data patterns with large areas of low probability mixed with small areas of high probability. Customers are then characterised and segmented based on the fitted GMM which corresponds to how each of them uses the products/services spatially in their daily lives; this is essentially their likely lifestyle and occupational traits. Other significant research contributions include fitting GMMs using VB to circular data i.e., the temporal usage behaviour, and developing clustering algorithms suitable for high dimensional data based on the use of VB-GMM.
Resumo:
In many applications, e.g., bioinformatics, web access traces, system utilisation logs, etc., the data is naturally in the form of sequences. People have taken great interest in analysing the sequential data and finding the inherent characteristics or relationships within the data. Sequential association rule mining is one of the possible methods used to analyse this data. As conventional sequential association rule mining very often generates a huge number of association rules, of which many are redundant, it is desirable to find a solution to get rid of those unnecessary association rules. Because of the complexity and temporal ordered characteristics of sequential data, current research on sequential association rule mining is limited. Although several sequential association rule prediction models using either sequence constraints or temporal constraints have been proposed, none of them considered the redundancy problem in rule mining. The main contribution of this research is to propose a non-redundant association rule mining method based on closed frequent sequences and minimal sequential generators. We also give a definition for the non-redundant sequential rules, which are sequential rules with minimal antecedents but maximal consequents. A new algorithm called CSGM (closed sequential and generator mining) for generating closed sequences and minimal sequential generators is also introduced. A further experiment has been done to compare the performance of generating non-redundant sequential rules and full sequential rules, meanwhile, performance evaluation of our CSGM and other closed sequential pattern mining or generator mining algorithms has also been conducted. We also use generated non-redundant sequential rules for query expansion in order to improve recommendations for infrequently purchased products.
Resumo:
Current approaches to the regulation of coal mining activities in Australia have facilitated the extraction of substantial amounts of coal and coal seam gas. The regulation of coal mining activities must now achieve the reduction or mitigation of greenhouse gas emissions in order to address the challenge of climate change and achieve ecologically sustainable development. Several legislative mechanisms currently exist which appear to offer the means to bring about the reduction or mitigation of greenhouse gas emissions from coal mining activities, yet Australia’s emissions from coal mining continue to rise. This article critiques these existing legislative mechanisms and presents recommendations for reform.
Resumo:
Road asset managers are overwhelmed with a high volume of raw data which they need to process and utilise in supporting their decision making. This paper presents a method that processes road-crash data of a whole road network and exposes hidden value inherent in the data by deploying the clustering data mining method. The goal of the method is to partition the road network into a set of groups (classes) based on common data and characterise the class crash types to produce a crash profiles for each cluster. By comparing similar road classes with differing crash types and rates, insight can be gained into these differences that are caused by the particular characteristics of their roads. These differences can be used as evidence in knowledge development and decision support.
Resumo:
Process mining techniques are able to extract knowledge from event logs commonly available in today’s information systems. These techniques provide new means to discover, monitor, and improve processes in a variety of application domains. There are two main drivers for the growing interest in process mining. On the one hand, more and more events are being recorded, thus, providing detailed information about the history of processes. On the other hand, there is a need to improve and support business processes in competitive and rapidly changing environments. This manifesto is created by the IEEE Task Force on Process Mining and aims to promote the topic of process mining. Moreover, by defining a set of guiding principles and listing important challenges, this manifesto hopes to serve as a guide for software developers, scientists, consultants, business managers, and end-users. The goal is to increase the maturity of process mining as a new tool to improve the (re)design, control, and support of operational business processes.
Resumo:
With the increasing number of XML documents in varied domains, it has become essential to identify ways of finding interesting information from these documents. Data mining techniques were used to derive this interesting information. Mining on XML documents is impacted by its model due to the semi-structured nature of these documents. Hence, in this chapter we present an overview of the various models of XML documents, how these models were used for mining and some of the issues and challenges in these models. In addition, this chapter also provides some insights into the future models of XML documents for effectively capturing the two important features namely structure and content of XML documents for mining.
Resumo:
Australia is currently in the midst of a major resources boom. Resultant growing demands for labour in regional and remote areas have accelerated the recruitment of non resident workers, mostly contractors, who work extended block rosters of 12-hour shifts and are accommodated in work camps, often adjacent to established mining towns. Serious social impacts of these practices, including violence and crime, have generally escaped industry, government and academic scrutiny. This paper highlights some of these impacts on affected regional communities and workers and argues that post-industrial mining regimes serve to mask and privatize these harms and risks, shifting them on to workers, families and communities.
Resumo:
While the role of executives’ cognition in organisations’ responses to change is a central topic in strategic cognition research, changes in firms’ environment are typically not measured directly but described either as an event (for example, new industry legislation) or represented by a time period (e.g. when a new technology impacted an industry). The Australian mining sector has witnessed a historically significant change in demand for its products and we begin by developing measures of changes in supply and demand for key commodities during the period 1992-2008. We identify sub-groups of firms based on their activities and commodity sector and examine the relation of these variables to executives’ cognition and to firms’ CapEx. We find industry, firm and cognitive variables are related to both strategic cognition and firms’ CapEx.
Resumo:
In the long term, with development of skill, knowledge, exposure and confidence within the engineering profession, rigorous analysis techniques have the potential to become a reliable and far more comprehensive method for design and verification of the structural adequacy of OPS, write Nimal J Perera, David P Thambiratnam and Brian Clark. This paper explores the potential to enhance operator safety of self-propelled mechanical plant subjected to roll over and impact of falling objects using the non-linear and dynamic response simulation capabilities of analytical processes to supplement quasi-static testing methods prescribed in International and Australian Codes of Practice for bolt on Operator Protection Systems (OPS) that are post fitted. The paper is based on research work carried out by the authors at the Queensland University of Technology (QUT) over a period of three years by instrumentation of prototype tests, scale model tests in the laboratory and rigorous analysis using validated Finite Element (FE) Models. The FE codes used were ABAQUS for implicit analysis and LSDYNA for explicit analysis. The rigorous analysis and dynamic simulation technique described in the paper can be used to investigate the structural response due to accident scenarios such as multiple roll over, impact of multiple objects and combinations of such events and thereby enhance the safety and performance of Roll Over and Falling Object Protection Systems (ROPS and FOPS). The analytical techniques are based on sound engineering principles and well established practice for investigation of dynamic impact on all self propelled vehicles. They are used for many other similar applications where experimental techniques are not feasible.