980 resultados para Information Mining


Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Despite all attempts to prevent fraud, it continues to be a major threat to industry and government. Traditionally, organizations have focused on fraud prevention rather than detection, to combat fraud. In this paper we present a role mining inspired approach to represent user behaviour in Enterprise Resource Planning (ERP) systems, primarily aimed at detecting opportunities to commit fraud or potentially suspicious activities. We have adapted an approach which uses set theory to create transaction profiles based on analysis of user activity records. Based on these transaction profiles, we propose a set of (1) anomaly types to detect potentially suspicious user behaviour, and (2) scenarios to identify inadequate segregation of duties in an ERP environment. In addition, we present two algorithms to construct a directed acyclic graph to represent relationships between transaction profiles. Experiments were conducted using a real dataset obtained from a teaching environment and a demonstration dataset, both using SAP R/3, presently the predominant ERP system. The results of this empirical research demonstrate the effectiveness of the proposed approach.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes an autonomous navigation system for a large underground mining vehicle. The control architecture is based on a robust reactive wall-following behaviour. To make it purposeful we provide driving hints derived from an approximate nodal-map. For most of the time, the vehicle is driven with weak localization (odometry). This need only be improved at intersections where decisions must be made – a technique we refer to as opportunistic localization. The paper briefly reviews absolute and relative navigation strategies, and describes an implementation of a reactive navigation system on a 30 tonne Load-Haul-Dump truck. This truck has achieved full-speed autonomous operation at an artificial test mine, and subsequently, at a operational underground mine.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Queensland Coal Industry Employees Health Scheme was implemented in 1993 to provide health surveillance for all Queensland coal industry workers. Tt1e government, mining employers and mining unions agreed that the scheme should operate for seven years. At the expiry of the scheme, an assessment of the contribution of health surveillance to meet coal industry needs would be an essential part of determining a future health surveillance program. This research project has analysed the data made available between 1993 and 1998. All current coal industry employees have had at least one health assessment. The project examined how the centralised nature of the Health Scheme benefits industry by identi~)jng key health issues and exploring their dimensions on a scale not possible by corporate based health surveillance programs. There is a body of evidence that indicates that health awareness - on the scale of the individual, the work group and the industry is not a part of the mining industry culture. There is also growing evidence that there is a need for this culture to change and that some change is in progress. One element of this changing culture is a growth in the interest by the individual and the community in information on health status and benchmarks that are reasonably attainable. This interest opens the way for health education which contains personal, community and occupational elements. An important element of such education is the data on mine site health status. This project examined the role of health surveillance in the coal mining industry as a tool for generating the necessary information to promote an interest in health awareness. The Health Scheme Database provides the material for the bulk of the analysis of this project. After a preliminary scan of the data set, more detailed analysis was undertaken on key health and related safety issues that include respiratory disorders, hearing loss and high blood pressure. The data set facilitates control for confounding factors such as age and smoking status. Mines can be benchmarked to identify those mines with effective health management and those with particular challenges. While the study has confirmed the very low prevalence of restrictive airway disease such as pneu"moconiosis, it has demonstrated a need to examine in detail the emergence of obstructive airway disease such as bronchitis and emphysema which may be a consequence of the increasing use of high dust longwall technology. The power of the Health Database's electronic data management is demonstrated by linking the health data to other data sets such as injury data that is collected by the Department of l\1mes and Energy. The analysis examines serious strain -sprain injuries and has identified a marked difference between the underground and open cut sectors of the industry. The analysis also considers productivity and OHS data to examine the extent to which there is correlation between any pairs ofJpese and previously analysed health parameters. This project has demonstrated that the current structure of the Coal Industry Employees Health Scheme has largely delivered to mines and effective health screening process. At the same time, the centralised nature of data collection and analysis has provided to the mines, the unions and the government substantial statistical cross-sectional data upon which strategies to more effectively manage health and relates safety issues can be based.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Keyword Spotting is the task of detecting keywords of interest within continu- ous speech. The applications of this technology range from call centre dialogue systems to covert speech surveillance devices. Keyword spotting is particularly well suited to data mining tasks such as real-time keyword monitoring and unre- stricted vocabulary audio document indexing. However, to date, many keyword spotting approaches have su®ered from poor detection rates, high false alarm rates, or slow execution times, thus reducing their commercial viability. This work investigates the application of keyword spotting to data mining tasks. The thesis makes a number of major contributions to the ¯eld of keyword spotting. The ¯rst major contribution is the development of a novel keyword veri¯cation method named Cohort Word Veri¯cation. This method combines high level lin- guistic information with cohort-based veri¯cation techniques to obtain dramatic improvements in veri¯cation performance, in particular for the problematic short duration target word class. The second major contribution is the development of a novel audio document indexing technique named Dynamic Match Lattice Spotting. This technique aug- ments lattice-based audio indexing principles with dynamic sequence matching techniques to provide robustness to erroneous lattice realisations. The resulting algorithm obtains signi¯cant improvement in detection rate over lattice-based audio document indexing while still maintaining extremely fast search speeds. The third major contribution is the study of multiple veri¯er fusion for the task of keyword veri¯cation. The reported experiments demonstrate that substantial improvements in veri¯cation performance can be obtained through the fusion of multiple keyword veri¯ers. The research focuses on combinations of speech background model based veri¯ers and cohort word veri¯ers. The ¯nal major contribution is a comprehensive study of the e®ects of limited training data for keyword spotting. This study is performed with consideration as to how these e®ects impact the immediate development and deployment of speech technologies for non-English languages.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In a seminal data mining article, Leo Breiman [1] argued that to develop effective predictive classification and regression models, we need to move away from the sole dependency on statistical algorithms and embrace a wider toolkit of modeling algorithms that include data mining procedures. Nevertheless, many researchers still rely solely on statistical procedures when undertaking data modeling tasks; the sole reliance on these procedures has lead to the development of irrelevant theory and questionable research conclusions ([1], p.199). We will outline initiatives that the HPC & Research Support group is undertaking to engage researchers with data mining tools and techniques; including a new range of seminars, workshops, and one-on-one consultations covering data mining algorithms, the relationship between data mining and the research cycle, and limitations and problems with these new algorithms. Organisational limitations and restrictions to these initiatives are also discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Advances in data mining have provided techniques for automatically discovering underlying knowledge and extracting useful information from large volumes of data. Data mining offers tools for quick discovery of relationships, patterns and knowledge in large complex databases. Application of data mining to manufacturing is relatively limited mainly because of complexity of manufacturing data. Growing self organizing map (GSOM) algorithm has been proven to be an efficient algorithm to analyze unsupervised DNA data. However, it produced unsatisfactory clustering when used on some large manufacturing data. In this paper a data mining methodology has been proposed using a GSOM tool which was developed using a modified GSOM algorithm. The proposed method is used to generate clusters for good and faulty products from a manufacturing dataset. The clustering quality (CQ) measure proposed in the paper is used to evaluate the performance of the cluster maps. The paper also proposed an automatic identification of variables to find the most probable causative factor(s) that discriminate between good and faulty product by quickly examining the historical manufacturing data. The proposed method offers the manufacturers to smoothen the production flow and improve the quality of the products. Simulation results on small and large manufacturing data show the effectiveness of the proposed method.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The XML Document Mining track was launched for exploring two main ideas: (1) identifying key problems and new challenges of the emerging field of mining semi-structured documents, and (2) studying and assessing the potential of Machine Learning (ML) techniques for dealing with generic ML tasks in the structured domain, i.e., classification and clustering of semi-structured documents. This track has run for six editions during INEX 2005, 2006, 2007, 2008, 2009 and 2010. The first five editions have been summarized in previous editions and we focus here on the 2010 edition. INEX 2010 included two tasks in the XML Mining track: (1) unsupervised clustering task and (2) semi-supervised classification task where documents are organized in a graph. The clustering task requires the participants to group the documents into clusters without any knowledge of category labels using an unsupervised learning algorithm. On the other hand, the classification task requires the participants to label the documents in the dataset into known categories using a supervised learning algorithm and a training set. This report gives the details of clustering and classification tasks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Item folksonomy or tag information is a kind of typical and prevalent web 2.0 information. Item folksonmy contains rich opinion information of users on item classifications and descriptions. It can be used as another important information source to conduct opinion mining. On the other hand, each item is associated with taxonomy information that reflects the viewpoints of experts. In this paper, we propose to mine for users’ opinions on items based on item taxonomy developed by experts and folksonomy contributed by users. In addition, we explore how to make personalized item recommendations based on users’ opinions. The experiments conducted on real word datasets collected from Amazon.com and CiteULike demonstrated the effectiveness of the proposed approaches.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Automated analysis of the sentiments presented in online consumer feedbacks can facilitate both organizations’ business strategy development and individual consumers’ comparison shopping. Nevertheless, existing opinion mining methods either adopt a context-free sentiment classification approach or rely on a large number of manually annotated training examples to perform context sensitive sentiment classification. Guided by the design science research methodology, we illustrate the design, development, and evaluation of a novel fuzzy domain ontology based contextsensitive opinion mining system. Our novel ontology extraction mechanism underpinned by a variant of Kullback-Leibler divergence can automatically acquire contextual sentiment knowledge across various product domains to improve the sentiment analysis processes. Evaluated based on a benchmark dataset and real consumer reviews collected from Amazon.com, our system shows remarkable performance improvement over the context-free baseline.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many data mining techniques have been proposed for mining useful patterns in text documents. However, how to effectively use and update discovered patterns is still an open research issue, especially in the domain of text mining. Since most existing text mining methods adopted term-based approaches, they all suffer from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern (or phrase) based approaches should perform better than the term-based ones, but many experiments did not support this hypothesis. This paper presents an innovative technique, effective pattern discovery which includes the processes of pattern deploying and pattern evolving, to improve the effectiveness of using and updating discovered patterns for finding relevant and interesting information. Substantial experiments on RCV1 data collection and TREC topics demonstrate that the proposed solution achieves encouraging performance.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of the large number of terms, patterns, and noise. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern-based methods should perform better than term-based ones in describing user preferences, but many experiments do not support this hypothesis. The innovative technique presented in paper makes a breakthrough for this difficulty. This technique discovers both positive and negative patterns in text documents as higher level features in order to accurately weight low-level features (terms) based on their specificity and their distributions in the higher level features. Substantial experiments using this technique on Reuters Corpus Volume 1 and TREC topics show that the proposed approach significantly outperforms both the state-of-the-art term-based methods underpinned by Okapi BM25, Rocchio or Support Vector Machine and pattern based methods on precision, recall and F measures.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents an automated image‐based safety assessment method for earthmoving and surface mining activities. The literature review revealed the possible causes of accidents on earthmoving operations, investigated the spatial risk factors of these types of accident, and identified spatial data needs for automated safety assessment based on current safety regulations. Image‐based data collection devices and algorithms for safety assessment were then evaluated. Analysis methods and rules for monitoring safety violations were also discussed. The experimental results showed that the safety assessment method collected spatial data using stereo vision cameras, applied object identification and tracking algorithms, and finally utilized identified and tracked object information for safety decision making.