25 resultados para Text Mining, Classificazione semantica, Documenti

em CentAUR: Central Archive University of Reading - UK


Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Aircraft Maintenance, Repair and Overhaul (MRO) feedback commonly includes an engineer’s complex text-based inspection report. Capturing and normalizing the content of these textual descriptions is vital to cost and quality benchmarking, and provides information to facilitate continuous improvement of MRO process and analytics. As data analysis and mining tools requires highly normalized data, raw textual data is inadequate. This paper offers a textual-mining solution to efficiently analyse bulk textual feedback data. Despite replacement of the same parts and/or sub-parts, the actual service cost for the same repair is often distinctly different from similar previously jobs. Regular expression algorithms were incorporated with an aircraft MRO glossary dictionary in order to help provide additional information concerning the reason for cost variation. Professional terms and conventions were included within the dictionary to avoid ambiguity and improve the outcome of the result. Testing results show that most descriptive inspection reports can be appropriately interpreted, allowing extraction of highly normalized data. This additional normalized data strongly supports data analysis and data mining, whilst also increasing the accuracy of future quotation costing. This solution has been effectively used by a large aircraft MRO agency with positive results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Aircraft Maintenance, Repair and Overhaul (MRO) agencies rely largely on row-data based quotation systems to select the best suppliers for the customers (airlines). The data quantity and quality becomes a key issue to determining the success of an MRO job, since we need to ensure we achieve cost and quality benchmarks. This paper introduces a data mining approach to create an MRO quotation system that enhances the data quantity and data quality, and enables significantly more precise MRO job quotations. Regular Expression was utilized to analyse descriptive textual feedback (i.e. engineer’s reports) in order to extract more referable highly normalised data for job quotation. A text mining based key influencer analysis function enables the user to proactively select sub-parts, defects and possible solutions to make queries more accurate. Implementation results show that system data would improve cost quotation in 40% of MRO jobs, would reduce service cost without causing a drop in service quality.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable for large-scale, multi-domain, heterogeneous environments, such as computational grids.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a general Multi-Agent System framework for distributed data mining based on a Peer-to-Peer model. Agent protocols are implemented through message-based asynchronous communication. The framework adopts a dynamic load balancing policy that is particularly suitable for irregular search algorithms. A modular design allows a separation of the general-purpose system protocols and software components from the specific data mining algorithm. The experimental evaluation has been carried out on a parallel frequent subgraph mining algorithm, which has shown good scalability performances.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recently, two approaches have been introduced that distribute the molecular fragment mining problem. The first approach applies a master/worker topology, the second approach, a completely distributed peer-to-peer system, solves the scalability problem due to the bottleneck at the master node. However, in many real world scenarios the participating computing nodes cannot communicate directly due to administrative policies such as security restrictions. Thus, potential computing power is not accessible to accelerate the mining run. To solve this shortcoming, this work introduces a hierarchical topology of computing resources, which distributes the management over several levels and adapts to the natural structure of those multi-domain architectures. The most important aspect is the load balancing scheme, which has been designed and optimized for the hierarchical structure. The approach allows dynamic aggregation of heterogenous computing resources and is applied to wide area network scenarios.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In real world applications sequential algorithms of data mining and data exploration are often unsuitable for datasets with enormous size, high-dimensionality and complex data structure. Grid computing promises unprecedented opportunities for unlimited computing and storage resources. In this context there is the necessity to develop high performance distributed data mining algorithms. However, the computational complexity of the problem and the large amount of data to be explored often make the design of large scale applications particularly challenging. In this paper we present the first distributed formulation of a frequent subgraph mining algorithm for discriminative fragments of molecular compounds. Two distributed approaches have been developed and compared on the well known National Cancer Institute’s HIV-screening dataset. We present experimental results on a small-scale computing environment.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Structured data represented in the form of graphs arises in several fields of the science and the growing amount of available data makes distributed graph mining techniques particularly relevant. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiver-initiated, load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening dataset, where the approach attains close-to linear speedup in a network of workstations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Frequent pattern discovery in structured data is receiving an increasing attention in many application areas of sciences. However, the computational complexity and the large amount of data to be explored often make the sequential algorithms unsuitable. In this context high performance distributed computing becomes a very interesting and promising approach. In this paper we present a parallel formulation of the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The application is characterized by a highly irregular tree-structured computation. No estimation is available for task workloads, which show a power-law distribution in a wide range. The proposed approach allows dynamic resource aggregation and provides fault and latency tolerance. These features make the distributed application suitable for multi-domain heterogeneous environments, such as computational Grids. The distributed application has been evaluated on the well known National Cancer Institute’s HIV-screening dataset.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper provides an extended analysis of livelihood diversification in rural Tanzania, with special emphasis on artisanal and small-scale mining (ASM). Over the past decade, this sector of industry, which is labour-intensive and comprises an array of rudimentary and semi-mechanized operations, has become an indispensable economic activity throughout Sub-Saharan Africa, providing employment to a host of redundant public sector workers, retrenched large-scale mine labourers and poor farmers. In many of the region’s rural areas, it is overtaking subsistence agriculture as the primary industry. Such a pattern appears to be unfolding within the Morogoro and Mbeya regions of southern Tanzania, where findings from recent research suggest that a growing number of smallholder farmers are turning to ASM for employment and financial support. It is imperative that national rural development programmes take this trend into account and provide support to these people.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This is a report on the data-mining of two chess databases, the objective being to compare their sub-7-man content with perfect play as documented in Nalimov endgame tables. Van der Heijden’s ENDGAME STUDY DATABASE IV is a definitive collection of 76,132 studies in which White should have an essentially unique route to the stipulated goal. Chessbase’s BIG DATABASE 2010 holds some 4.5 million games. Insight gained into both database content and data-mining has led to some delightful surprises and created a further agenda.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The governance of water resources is prominent in both water policy agendas and academic scholarship. Political ecologists have made important advances in reconceptualising the relationship between water and society. Yet, while they have stressed both the scalar dimensions, and the politicised nature, of water governance, analyses of its scalar politics are relatively nascent. In this paper, we consider how the increased demand for water resources by the growing mining industry in Peru reconfigures and rescales water governance. In Peru, the mining industry’s thirst for water draws in, and reshapes, social relations, technologies, institutions and discourses that operate over varying spatial and temporal scales. We develop the concept of waterscape to examine these multiple ways in water is co-produced through mining, and become embedded in changing modes and structures of water governance, often beyond the watershed scale. We argue that an examination of waterscapes avoids the limitations of thinking about water in purely material terms, structuring analysis of water issues according to traditional spatial scales and institutional hierarchies, and taking these scales and structures for granted.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This article clarifies what was done with the sub-7-man positions in data-mining Harold van der Heijden's 'HHdbIV' database of chess studies prior to its publication. It emphasises that only positions in the main lines of studies were examined and that the information about uniqueness of move was not incorporated in HHdbIV. There is some reflection on the separate technical and artistic dimensions of study evaluation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This article examines Corporate Social Responsibility (CSR) and mining community development, sustainability and viability. These issues are considered focussing on current and former company-owned mining towns in Namibia. Historically company towns have been a feature of mining activity in Namibia. However, the fate of such towns upon mine closure has been and remains controversial. Declining former mining communities and even ghost mining towns can be found across the country. This article draws upon research undertaken in Namibia and considers these issues with reference to three case study communities. This article examines the complexities which surround decision-making about these communities, and the challenges faced in efforts to encourage their sustainability after mining. In this article, mine company engagements through CSR with the development, sustainability and viability of such communities are also critically discussed. The role, responsibilities, and actions of the state in relation to these communities are furthermore reflected upon. Finally, ways forward for these communities are considered.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This article examines the marginal position of artisanal miners in sub-Saharan Africa, and considers how they are incorporated into mineral sector change in the context of institutional and legal integration. Taking the case of diamond and gold mining in Tanzania, the concept of social exclusion is used to explore the consequences of marginalization on people's access to mineral resources and ability to make a living from artisanal mining. Because existing inequalities and forms of discrimination are ignored by the Tanzanian state, the institutionalization of mineral titles conceals social and power relations that perpetuate highly unequal access to resources. The article highlights the complexity of these processes, and shows that while legal integration can benefit certain wealthier categories of people, who fit into the model of an 'entrepreneurial small-scale miner', for others adverse incorporation contributes to socio-economic dependence, exploitation and insecurity. For the issue of marginality to be addressed within integration processes, the existence of local forms of organization, institutions and relationships, which underpin inequalities and discrimination, need to be recognized.