970 resultados para 650200 Mining and Extraction


Relevância:

100.00% 100.00%

Publicador:

Resumo:

It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of the large number of terms, patterns, and noise. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern-based methods should perform better than term-based ones in describing user preferences, but many experiments do not support this hypothesis. The innovative technique presented in paper makes a breakthrough for this difficulty. This technique discovers both positive and negative patterns in text documents as higher level features in order to accurately weight low-level features (terms) based on their specificity and their distributions in the higher level features. Substantial experiments using this technique on Reuters Corpus Volume 1 and TREC topics show that the proposed approach significantly outperforms both the state-of-the-art term-based methods underpinned by Okapi BM25, Rocchio or Support Vector Machine and pattern based methods on precision, recall and F measures.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the increasing number of XML documents in varied domains, it has become essential to identify ways of finding interesting information from these documents. Data mining techniques were used to derive this interesting information. Mining on XML documents is impacted by its model due to the semi-structured nature of these documents. Hence, in this chapter we present an overview of the various models of XML documents, how these models were used for mining and some of the issues and challenges in these models. In addition, this chapter also provides some insights into the future models of XML documents for effectively capturing the two important features namely structure and content of XML documents for mining.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we describe the main processes and operations in mining industries and present a comprehensive survey of operations research methodologies that have been applied over the last several decades. The literature review is classified into four main categories: mine design; mine production; mine transportation; and mine evaluation. Mining design models are further separated according to two main mining methods: open-pit and underground. Moreover, mine production models are subcategorised into two groups: ore mining and coal mining. Mine transportation models are further partitioned in accordance with fleet management, truck haulage and train scheduling. Mine evaluation models are further subdivided into four clusters in terms of mining method selection, quality control, financial risks and environmental protection. The main characteristics of four Australian commercial mining software are addressed and compared. This paper bridges the gaps in the literature and motivates researchers to develop more applicable, realistic and comprehensive operations research models and solution techniques that are directly linked with mining industries.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Key decisions at the collection, pre-processing, transformation, mining and interpretation phase of any knowledge discovery from database (KDD) process depend heavily on assumptions and theorectical perspectives relating to the type of task to be performed and characteristics of data sourced. In this article, we compare and contrast theoretical perspectives and assumptions taken in data mining exercises in the legal domain with those adopted in data mining in TCM and allopathic medicine. The juxtaposition results in insights for the application of KDD for Traditional Chinese Medicine.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis is a study for automatic discovery of text features for describing user information needs. It presents an innovative data-mining approach that discovers useful knowledge from both relevance and non-relevance feedback information. The proposed approach can largely reduce noises in discovered patterns and significantly improve the performance of text mining systems. This study provides a promising method for the study of Data Mining and Web Intelligence.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As of today, opinion mining has been widely used to iden- tify the strength and weakness of products (e.g., cameras) or services (e.g., services in medical clinics or hospitals) based upon people's feed- back such as user reviews. Feature extraction is a crucial step for opinion mining which has been used to collect useful information from user reviews. Most existing approaches only find individual features of a product without the structural relationships between the features which usually exists. In this paper, we propose an approach to extract features and feature relationship, represented as tree structure called a feature hi- erarchy, based on frequent patterns and associations between patterns derived from user reviews. The generated feature hierarchy profiles the product at multiple levels and provides more detailed information about the product. Our experiment results based on some popularly used review datasets show that the proposed feature extraction approach can identify more correct features than the baseline model. Even though the datasets used in the experiment are about cameras, our work can be ap- plied to generate features about a service such as the services in hospitals or clinics.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The mining industry has positioned itself within the sustainability agenda, particularly since the establishment of the International Council of Mining and Minerals (ICMM). However, some critics have questioned this position, since mining requires the extraction of non-renewable finite resources and commercial mining companies have the specific responsibility to produce profit. Complicating matters is that terms that represent the sustainability such as ‘sustainability’ and ‘sustainable development’ have multiple definitions with varying degrees of sophistication. This work identifies eleven sustainability agenda definitions that are applicable to the mining industry and organises them into three tiers: first, Perpetual Sustainability, that focuses on mining continuing indefinitely with its benefits limited to immediate shareholders; second, Transferable Sustainability, that focuses on how mining can benefit society and the environment and third, Transitional Sustainability, that focuses on the intergenerational benefits to society and the environment even after mining ceases. Using these definitions, a discourse analysis was performed on sustainability reports from member companies of the ICMM and the academic journal Resources Policy. The discourse analysis showed that in both media the definition of the sustainability agenda was focussed on Transferable Sustainability, with the sustainability reports focused on how it can be applied within a business context while the academic journal took a broader view of mining’s social and environmental impacts.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Commercially viable carbon–neutral biodiesel production from microalgae has potential for replacing depleting petroleum diesel. The process of biodiesel production from microalgae involves harvesting, drying and extraction of lipids which are energy- and cost-intensive processes. The development of effective large-scale lipid extraction processes which overcome the complexity of microalgae cell structure is considered one of the most vital requirements for commercial production. Thus the aim of this work was to investigate suitable extraction methods with optimised conditions to progress opportunities for sustainable microalgal biodiesel production. In this study, the green microalgal species consortium, Tarong polyculture was used to investigate lipid extraction with hexane (solvent) under high pressure and variable temperature and biomass moisture conditions using an Accelerated Solvent Extraction (ASE) method. The performance of high pressure solvent extraction was examined over a range of different process and sample conditions (dry biomass to water ratios (DBWRs): 100%, 75%, 50% and 25% and temperatures from 70 to 120 ºC, process time 5–15 min). Maximum total lipid yields were achieved at 50% and 75% sample dryness at temperatures of 90–120 ºC. We show that individual fatty acids (Palmitic acid C16:0; Stearic acid C18:0; Oleic acid C18:1; Linolenic acid C18:3) extraction optima are influenced by temperature and sample dryness, consequently affecting microalgal biodiesel quality parameters. Higher heating values and kinematic viscosity were compliant with biodiesel quality standards under all extraction conditions used. Our results indicate that biodiesel quality can be positively manipulated by selecting process extraction conditions that favour extraction of saturated and mono-unsaturated fatty acids over optimal extraction conditions for polyunsaturated fatty acids, yielding positive effects on cetane number and iodine values. Exceeding biodiesel standards for these two parameters opens blending opportunities with biodiesels that fall outside the minimal cetane and maximal iodine values.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of large scale terms and data patterns. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, there has been often held the hypothesis that pattern-based methods should perform better than term-based ones in describing user preferences; yet, how to effectively use large scale patterns remains a hard problem in text mining. To make a breakthrough in this challenging issue, this paper presents an innovative model for relevance feature discovery. It discovers both positive and negative patterns in text documents as higher level features and deploys them over low-level features (terms). It also classifies terms into categories and updates term weights based on their specificity and their distributions in patterns. Substantial experiments using this model on RCV1, TREC topics and Reuters-21578 show that the proposed model significantly outperforms both the state-of-the-art term-based methods and the pattern based methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

On our first day in Kalgoorlie, a local woman in her mid-thirties tells us that ‘Kal wouldn’t exist if it wasn’t for mining and prostitution’. In the ensuing days many others would tell us the same thing. More explicitly, in the words of another local resident, ‘The town was founded on brothels. [Without them] the men wouldn’t have been happy and they wouldn’t have got as much gold.’ These two phenomena – mining and prostitution – and their seemingly natural and straightforward connection to each other are also routinely invoked in tourist and popular culture depictions of Kalgoorlie. The Lonely Planet, for example, notes that ‘historically, mineworkers would come straight to town to spend disposable income at Kalgoorlie’s infamous brothels, or at pubs staffed by “skimpies” (scantily clad female bar staff)’.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Product reviews are the foremost source of information for customers and manufacturers to help them make appropriate purchasing and production decisions. Natural language data is typically very sparse; the most common words are those that do not carry a lot of semantic content, and occurrences of any particular content-bearing word are rare, while co-occurrences of these words are rarer. Mining product aspects, along with corresponding opinions, is essential for Aspect-Based Opinion Mining (ABOM) as a result of the e-commerce revolution. Therefore, the need for automatic mining of reviews has reached a peak. In this work, we deal with ABOM as sequence labelling problem and propose a supervised extraction method to identify product aspects and corresponding opinions. We use Conditional Random Fields (CRFs) to solve the extraction problem and propose a feature function to enhance accuracy. The proposed method is evaluated using two different datasets. We also evaluate the effectiveness of feature function and the optimisation through multiple experiments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents classification, representation and extraction of deformation features in sheet-metal parts. The thickness is constant for these shape features and hence these are also referred to as constant thickness features. The deformation feature is represented as a set of faces with a characteristic arrangement among the faces. Deformation of the base-sheet or forming of material creates Bends and Walls with respect to a base-sheet or a reference plane. These are referred to as Basic Deformation Features (BDFs). Compound deformation features having two or more BDFs are defined as characteristic combinations of Bends and Walls and represented as a graph called Basic Deformation Features Graph (BDFG). The graph, therefore, represents a compound deformation feature uniquely. The characteristic arrangement of the faces and type of bends belonging to the feature decide the type and nature of the deformation feature. Algorithms have been developed to extract and identify deformation features from a CAD model of sheet-metal parts. The proposed algorithm does not require folding and unfolding of the part as intermediate steps to recognize deformation features. Representations of typical features are illustrated and results of extracting these deformation features from typical sheet metal parts are presented and discussed. (C) 2013 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A study was conducted in Tebuwana Wathurana Wetland ecosystem to understand its vegetation structure and faunal composition in order to assess its conservation needs. As there are no published records on the flora and fauna of Wathurana Wetlands in Tebuwana, it is necessary to understand the ecological and other relevant features in order to develop strategies to conserve this wetland. These objectives were pursued by surveying the vegetation of the wetland and by identifying fish and bird species present. A total of 66 species of flora and 61 species of fauna were identified in the survey. Of the 27 fish species recorded from the Tebuwana Wetland, 9 species were endemic and 17 species belonged to the indigenous category. With regard to the flora in the wetlands, the dominant families were Rubaceae, Fabaceae and Arecaceae. The 66 species belonged to 39 families and 61 genera while 12 species were endemic and 4 species were considered highly threatened. These flora were found in four layers. Of the 22 species of birds recorded, two species were endemic. This study revealed that these Wathurana Wetlands have a high species diversity but that they face many threats including encroachments, extraction of forest products mainly as timber, land filling, mining and occurrence of invasive species. It is essential to minimize the exploitation of natural resources from this wetland in the future and in particular to mark the boundary, conduct awareness programmes and continue research.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This research proposes a method for extracting technology intelligence (TI) systematically from a large set of document data. To do this, the internal and external sources in the form of documents, which might be valuable for TI, are first identified. Then the existing techniques and software systems applicable to document analysis are examined. Finally, based on the reviews, a document-mining framework designed for TI is suggested and guidelines for software selection are proposed. The research output is expected to support intelligence operatives in finding suitable techniques and software systems for getting value from document-mining and thus facilitate effective knowledge management. Copyright © 2012 Inderscience Enterprises Ltd.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, the extractabilities of Cyanex 302 and purified Cyanex 302 (hereafter HBTMPTP or HA) in heptane have been compared by extracting the scandium, yttrium, lanthanum, and gadolinium from hydrochloric acid solutions. The roles of the different components in Cyanex 302 on lanthanum extraction have been analyzed. The result demonstrates that the Cyanex 302 has a higher extractability than HBTMPTP, which perhaps originates from the interaction among the components in Cyanex 302. Especially for R3PO, obviously synergistic effect can be observed in the lower pH range and extraction mechanism of lanthanum using the mixture of HBTMPTP and TOPO has been deduced to be:where (HA)(2) and B denote the dimeric form of HBTMPTP and TOPO, respectively. At the same time, the separation abilities of Cyanex 302 and HBTMPTP on the rare earth elements have been compared. Also, the effect of temperature on the extraction with Cyaenx 302, HBTMPTP and the mixture of HBTMPTP and TOPO has also been discussed with thermodynamic functions Delta H, Delta S, and Delta G calculated.