347 resultados para Mining extraction


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This study investigates the use of unsupervised features derived from word embedding approaches and novel sequence representation approaches for improving clinical information extraction systems. Our results corroborate previous findings that indicate that the use of word embeddings significantly improve the effectiveness of concept extraction models; however, we further determine the influence that the corpora used to generate such features have. We also demonstrate the promise of sequence-based unsupervised features for further improving concept extraction.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

My thesis examined an alternative approach, referred to as the unitary taxation approach to the allocation of profit, which arises from the notion that as a multinational group exists as a single economic entity, it should be taxed as one taxable unit. The plausibility of a unitary taxation regime achieving international acceptance and agreement is highly contestable due to its implementation issues, and economic and political feasibility. Using a case-study approach focusing on Freeport-McMoRan and Rio Tinto's mining operations in Indonesia, this thesis compares both tax regimes against the criteria for a good tax system - equity, efficiency, neutrality and simplicity. This thesis evaluates key issues that arise when implementing a unitary taxation approach with formulary apportionment based on the context of mining multinational firms in Indonesia.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Existing process mining techniques provide summary views of the overall process performance over a period of time, allowing analysts to identify bottlenecks and associated performance issues. However, these tools are not de- signed to help analysts understand how bottlenecks form and dissolve over time nor how the formation and dissolution of bottlenecks – and associated fluctua- tions in demand and capacity – affect the overall process performance. This paper presents an approach to analyze the evolution of process performance via a notion of Staged Process Flow (SPF). An SPF abstracts a business process as a series of queues corresponding to stages. The paper defines a number of stage character- istics and visualizations that collectively allow process performance evolution to be analyzed from multiple perspectives. The approach has been implemented in the ProM process mining framework. The paper demonstrates the advantages of the SPF approach over state-of-the-art process performance mining tools using two real-life event logs publicly available.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Environmental changes have put great pressure on biological systems leading to the rapid decline of biodiversity. To monitor this change and protect biodiversity, animal vocalizations have been widely explored by the aid of deploying acoustic sensors in the field. Consequently, large volumes of acoustic data are collected. However, traditional manual methods that require ecologists to physically visit sites to collect biodiversity data are both costly and time consuming. Therefore it is essential to develop new semi-automated and automated methods to identify species in automated audio recordings. In this study, a novel feature extraction method based on wavelet packet decomposition is proposed for frog call classification. After syllable segmentation, the advertisement call of each frog syllable is represented by a spectral peak track, from which track duration, dominant frequency and oscillation rate are calculated. Then, a k-means clustering algorithm is applied to the dominant frequency, and the centroids of clustering results are used to generate the frequency scale for wavelet packet decomposition (WPD). Next, a new feature set named adaptive frequency scaled wavelet packet decomposition sub-band cepstral coefficients is extracted by performing WPD on the windowed frog calls. Furthermore, the statistics of all feature vectors over each windowed signal are calculated for producing the final feature set. Finally, two well-known classifiers, a k-nearest neighbour classifier and a support vector machine classifier, are used for classification. In our experiments, we use two different datasets from Queensland, Australia (18 frog species from commercial recordings and field recordings of 8 frog species from James Cook University recordings). The weighted classification accuracy with our proposed method is 99.5% and 97.4% for 18 frog species and 8 frog species respectively, which outperforms all other comparable methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

User generated information such as product reviews have been booming due to the advent of web 2.0. In particular, rich information associated with reviewed products has been buried in such big data. In order to facilitate identifying useful information from product (e.g., cameras) reviews, opinion mining has been proposed and widely used in recent years. In detail, as the most critical step of opinion mining, feature extraction aims to extract significant product features from review texts. However, most existing approaches only find individual features rather than identifying the hierarchical relationships between the product features. In this paper, we propose an approach which finds both features and feature relationships, structured as a feature hierarchy which is referred to as feature taxonomy in the remainder of the paper. Specifically, by making use of frequent patterns and association rules, we construct the feature taxonomy to profile the product at multiple levels instead of single level, which provides more detailed information about the product. The experiment which has been conducted based upon some real world review datasets shows that our proposed method is capable of identifying product features and relations effectively.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A central tenet in the theory of reliability modelling is the quantification of the probability of asset failure. In general, reliability depends on asset age and the maintenance policy applied. Usually, failure and maintenance times are the primary inputs to reliability models. However, for many organisations, different aspects of these data are often recorded in different databases (e.g. work order notifications, event logs, condition monitoring data, and process control data). These recorded data cannot be interpreted individually, since they typically do not have all the information necessary to ascertain failure and preventive maintenance times. This paper presents a methodology for the extraction of failure and preventive maintenance times using commonly-available, real-world data sources. A text-mining approach is employed to extract keywords indicative of the source of the maintenance event. Using these keywords, a Naïve Bayes classifier is then applied to attribute each machine stoppage to one of two classes: failure or preventive. The accuracy of the algorithm is assessed and the classified failure time data are then presented. The applicability of the methodology is demonstrated on a maintenance data set from an Australian electricity company.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This study presents a comprehensive mathematical formulation model for a short-term open-pit mine block sequencing problem, which considers nearly all relevant technical aspects in open-pit mining. The proposed model aims to obtain the optimum extraction sequences of the original-size (smallest) blocks over short time intervals and in the presence of real-life constraints, including precedence relationship, machine capacity, grade requirements, processing demands and stockpile management. A hybrid branch-and-bound and simulated annealing algorithm is developed to solve the problem. Computational experiments show that the proposed methodology is a promising way to provide quantitative recommendations for mine planning and scheduling engineers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Organochlorine pesticides (OCPs) are ubiquitous environmental contaminants with adverse impacts on aquatic biota, wildlife and human health even at low concentrations. However, conventional methods for their determination in river sediments are resource intensive. This paper presents an approach that is rapid and also reliable for the detection of OCPs. Accelerated Solvent Extraction (ASE) with in-cell silica gel clean-up followed by Triple Quadrupole Gas Chromatograph Mass Spectrometry (GCMS/MS) was used to recover OCPs from sediment samples. Variables such as temperature, solvent ratio, adsorbent mass and extraction cycle were evaluated and optimised for the extraction. With the exception of Aldrin, which was unaffected by any of the variables evaluated, the recovery of OCPs from sediment samples was largely influenced by solvent ratio and adsorbent mass and, to some extent, the number of cycles and temperature. The optimised conditions for OCPs extraction in sediment with good recoveries were determined to be 4 cycles, 4.5 g of silica gel, 105 ᴼC, and 4:3 v/v DCM: hexane mixture. With the exception of two compounds (α-BHC and Aldrin) whose recoveries were low (59.73 and 47.66 % respectively), the recovery of the other pesticides were in the range 85.35 – 117.97% with precision < 10 % RSD. The method developed significantly reduces sample preparation time, the amount of solvent used, matrix interference, and is highly sensitive and selective.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis increased the researchers understanding of the relationship between operations and maintenance in underground longwall coal mines, using data from a Queensland underground coal mine. The thesis explores various relationships between recorded variables. Issues with human recorded data was uncovered, and results emphasised the significance of variables associated with conveyor operation to explain production.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents 'vSpeak', the first initiative taken in Pakistan for ICT enabled conversion of dynamic Sign Urdu gestures into natural language sentences. To realize this, vSpeak has adopted a novel approach for feature extraction using edge detection and image compression which gives input to the Artificial Neural Network that recognizes the gesture. This technique caters for the blurred images as well. The training and testing is currently being performed on a dataset of 200 patterns of 20 words from Sign Urdu with target accuracy of 90% and above.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Multi-document summarization addressing the problem of information overload has been widely utilized in the various real-world applications. Most of existing approaches adopt term-based representation for documents which limit the performance of multi-document summarization systems. In this paper, we proposed a novel pattern-based topic model (PBTMSum) for the task of the multi-document summarization. PBTMSum combining pattern mining techniques with LDA topic modelling could generate discriminative and semantic rich representations for topics and documents so that the most representative and non-redundant sentences can be selected to form a succinct and informative summary. Extensive experiments are conducted on the data of document understanding conference (DUC) 2007. The results prove the effectiveness and efficiency of our proposed approach.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The rapid increase in the number of text documents available on the Internet has created pressure to use effective cleaning techniques. Cleaning techniques are needed for converting these documents to structured documents. Text cleaning techniques are one of the key mechanisms in typical text mining application frameworks. In this paper, we explore the role of text cleaning in the 20 newsgroups dataset, and report on experimental results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The idea of extracting knowledge in process mining is a descendant of data mining. Both mining disciplines emphasise data flow and relations among elements in the data. Unfortunately, challenges have been encountered when working with the data flow and relations. One of the challenges is that the representation of the data flow between a pair of elements or tasks is insufficiently simplified and formulated, as it considers only a one-to-one data flow relation. In this paper, we discuss how the effectiveness of knowledge representation can be extended in both disciplines. To this end, we introduce a new representation of the data flow and dependency formulation using a flow graph. The flow graph solves the issue of the insufficiency of presenting other relation types, such as many-to-one and one-to-many relations. As an experiment, a new evaluation framework is applied to the Teleclaim process in order to show how this method can provide us with more precise results when compared with other representations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

With the development of wearable and mobile computing technology, more and more people start using sleep-tracking tools to collect personal sleep data on a daily basis aiming at understanding and improving their sleep. While sleep quality is influenced by many factors in a person’s lifestyle context, such as exercise, diet and steps walked, existing tools simply visualize sleep data per se on a dashboard rather than analyse those data in combination with contextual factors. Hence many people find it difficult to make sense of their sleep data. In this paper, we present a cloud-based intelligent computing system named SleepExplorer that incorporates sleep domain knowledge and association rule mining for automated analysis on personal sleep data in light of contextual factors. Experiments show that the same contextual factors can play a distinct role in sleep of different people, and SleepExplorer could help users discover factors that are most relevant to their personal sleep.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Social Water Assessment Protocol (SWAP) is a tool consisting of a series of questions on fourteen themes designed to capture the social context of water around a mine site. A pilot study of the SWAP, conducted in Prestea-Huni Valley, Ghana, showed that some communities were concerned about whether the groundwater was potable. The mining company’s concern was that there was a cycle of dependency amongst communities that received treated water from the mining company. The pilot identified potential data sources and stakeholder groups for each theme, gaps in themes and suggested refinements to questions to improve the SWAP.