1000 resultados para Incremental mining


Relevância:

20.00% 20.00%

Publicador:

Resumo:

In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable for large-scale, multi-domain, heterogeneous environments, such as computational grids.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a general Multi-Agent System framework for distributed data mining based on a Peer-to-Peer model. Agent protocols are implemented through message-based asynchronous communication. The framework adopts a dynamic load balancing policy that is particularly suitable for irregular search algorithms. A modular design allows a separation of the general-purpose system protocols and software components from the specific data mining algorithm. The experimental evaluation has been carried out on a parallel frequent subgraph mining algorithm, which has shown good scalability performances.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a method to enhance fault localization for software systems based on a frequent pattern mining algorithm. Our method is based on a large set of test cases for a given set of programs in which faults can be detected. The test executions are recorded as function call trees. Based on test oracles the tests can be classified into successful and failing tests. A frequent pattern mining algorithm is used to identify frequent subtrees in successful and failing test executions. This information is used to rank functions according to their likelihood of containing a fault. The ranking suggests an order in which to examine the functions during fault analysis. We validate our approach experimentally using a subset of Siemens benchmark programs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recently, two approaches have been introduced that distribute the molecular fragment mining problem. The first approach applies a master/worker topology, the second approach, a completely distributed peer-to-peer system, solves the scalability problem due to the bottleneck at the master node. However, in many real world scenarios the participating computing nodes cannot communicate directly due to administrative policies such as security restrictions. Thus, potential computing power is not accessible to accelerate the mining run. To solve this shortcoming, this work introduces a hierarchical topology of computing resources, which distributes the management over several levels and adapts to the natural structure of those multi-domain architectures. The most important aspect is the load balancing scheme, which has been designed and optimized for the hierarchical structure. The approach allows dynamic aggregation of heterogenous computing resources and is applied to wide area network scenarios.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In real world applications sequential algorithms of data mining and data exploration are often unsuitable for datasets with enormous size, high-dimensionality and complex data structure. Grid computing promises unprecedented opportunities for unlimited computing and storage resources. In this context there is the necessity to develop high performance distributed data mining algorithms. However, the computational complexity of the problem and the large amount of data to be explored often make the design of large scale applications particularly challenging. In this paper we present the first distributed formulation of a frequent subgraph mining algorithm for discriminative fragments of molecular compounds. Two distributed approaches have been developed and compared on the well known National Cancer Institute’s HIV-screening dataset. We present experimental results on a small-scale computing environment.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Structured data represented in the form of graphs arises in several fields of the science and the growing amount of available data makes distributed graph mining techniques particularly relevant. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiver-initiated, load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening dataset, where the approach attains close-to linear speedup in a network of workstations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Frequent pattern discovery in structured data is receiving an increasing attention in many application areas of sciences. However, the computational complexity and the large amount of data to be explored often make the sequential algorithms unsuitable. In this context high performance distributed computing becomes a very interesting and promising approach. In this paper we present a parallel formulation of the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The application is characterized by a highly irregular tree-structured computation. No estimation is available for task workloads, which show a power-law distribution in a wide range. The proposed approach allows dynamic resource aggregation and provides fault and latency tolerance. These features make the distributed application suitable for multi-domain heterogeneous environments, such as computational Grids. The distributed application has been evaluated on the well known National Cancer Institute’s HIV-screening dataset.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Eight Jersey cows were used in two balanced 4 x 4 Latin Squares to investigate the effects of replacement of dietary starch with non-forage fibre on productivity, diet digestibility and feeding behaviour. Total-mixed rations consisted of maize silage, grass silage and a soyabean meal-based concentrate mixture, each at 250g/kg DM, with the remaining 250g consisting of cracked wheat/soya hulls (SH) in the ratios of 250:0, 167:83; 83:167 and 0:250 g, respectively, for treatments SH0, SH83, SH167 and SH250. Starch concentrations were 302, 248, 193 and 140g/kg DM, and NDF concentrations were 316, 355, 394 and 434g/kg DM, for treatments SHO, SH83, SH167 and SH250, respectively. Total eating time increased (p < 0.05) as SH inclusion increased, but total rumination time was unaffected. Digestibility of DM, organic matter and starch declined (p < 0.01) as SH inclusion increased, whilst digestibility of NDF and ADF increased (p < 0.01). Dry-matter intake tended to decline with increasing SH, whilst bodyweight, milk yield and fat and lactose concentrations were unaffected by treatment. Milk protein concentration decreased (p < 0.01) as SH level increased. Feed conversion efficiency improved (p < 0.05) as SH inclusion rose, but it was not possible to determine whether this was due to the increased fibre levels alone, or the favourable effect on rumen fermentation of decreasing starch levels. (c) 2006 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper provides an extended analysis of the tensions that have surfaced between large-scale mine operators and artisanal miners in gold-rich areas of rural Tanzania. The literature on grievance is used to contextualise, these disputes, the underlying cause of which is artisanal miners' mounting frustration over not being able to secure viable concessions to work. Newly implemented legislation has, for the most part, empowered foreign large-scale mine operators, while simultaneously disempowering indigenous small-scale miners. In many cases, the former have addressed mounting security and community problems on their own. Until the country's major mine operators extend assistance to marginalised small-scale mining groups, the likelihood of violent conflict unfolding between these parties will increase.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper critiques contemporary research and policy approaches taken toward the analysis and abatement of mercury pollution in the small-scale gold mining sector. Unmonitored releases of mercury from gold amalgamation have caused considerable environmental contamination and human health complications in rural reaches of sub-Saharan Africa, Latin America and Asia. Whilst these problems have caught the attention of the scientific community over the past 15-20 years, the research that has since been undertaken has failed to identify appropriate mitigation measures, and has done little to advance understanding of why contamination persists. Moreover, the strategies used to educate operators about the impacts of acute mercury exposure, and the technologies implemented to prevent farther pollution, have been marginally effective at best. The mercury pollution problem will not be resolved until governments and donor agencies commit to carrying out research aimed at improving understanding of the dynamics of small scale gold mining communities. Acquisition of this knowledge is the key to designing and implementing appropriate support and abatement measures. (c) 2005 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The World Bank, United Nations and UK Department for International Development (DfID) have spearheaded a recent global drive to regularize artisanal and small-scale mining (ASM), and provide assistance to its predominantly impoverished participants. To date, millions of dollars have been pledged toward the design of industry-specific policies and regulations; implementation of mechanized equipment; extension; and the launch of alternative livelihood (AL) programmes aimed at diversifying local economies. Much of this funding, however, has failed to facilitate marked improvements, and in many cases, has exacerbated problems. This paper argues that a poor understanding of artisanal, mine-community dynamics and operators’ needs has, in a number of cases, led to the design and implementation of inappropriate industry support schemes and interventions. The discussion focuses upon experiences from sub-Saharan Africa, where ASM is in the most rudimentary of states.