955 resultados para Widely distributed


Relevância:

20.00% 20.00%

Publicador:

Resumo:

In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable for large-scale, multi-domain, heterogeneous environments, such as computational grids.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a general Multi-Agent System framework for distributed data mining based on a Peer-to-Peer model. Agent protocols are implemented through message-based asynchronous communication. The framework adopts a dynamic load balancing policy that is particularly suitable for irregular search algorithms. A modular design allows a separation of the general-purpose system protocols and software components from the specific data mining algorithm. The experimental evaluation has been carried out on a parallel frequent subgraph mining algorithm, which has shown good scalability performances.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recently, two approaches have been introduced that distribute the molecular fragment mining problem. The first approach applies a master/worker topology, the second approach, a completely distributed peer-to-peer system, solves the scalability problem due to the bottleneck at the master node. However, in many real world scenarios the participating computing nodes cannot communicate directly due to administrative policies such as security restrictions. Thus, potential computing power is not accessible to accelerate the mining run. To solve this shortcoming, this work introduces a hierarchical topology of computing resources, which distributes the management over several levels and adapts to the natural structure of those multi-domain architectures. The most important aspect is the load balancing scheme, which has been designed and optimized for the hierarchical structure. The approach allows dynamic aggregation of heterogenous computing resources and is applied to wide area network scenarios.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper focuses on improving computer network management by the adoption of artificial intelligence techniques. A logical inference system has being devised to enable automated isolation, diagnosis, and even repair of network problems, thus enhancing the reliability, performance, and security of networks. We propose a distributed multi-agent architecture for network management, where a logical reasoner acts as an external managing entity capable of directing, coordinating, and stimulating actions in an active management architecture. The active networks technology represents the lower level layer which makes possible the deployment of code which implement teleo-reactive agents, distributed across the whole network. We adopt the Situation Calculus to define a network model and the Reactive Golog language to implement the logical reasoner. An active network management architecture is used by the reasoner to inject and execute operational tasks in the network. The integrated system collects the advantages coming from logical reasoning and network programmability, and provides a powerful system capable of performing high-level management tasks in order to deal with network fault.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In real world applications sequential algorithms of data mining and data exploration are often unsuitable for datasets with enormous size, high-dimensionality and complex data structure. Grid computing promises unprecedented opportunities for unlimited computing and storage resources. In this context there is the necessity to develop high performance distributed data mining algorithms. However, the computational complexity of the problem and the large amount of data to be explored often make the design of large scale applications particularly challenging. In this paper we present the first distributed formulation of a frequent subgraph mining algorithm for discriminative fragments of molecular compounds. Two distributed approaches have been developed and compared on the well known National Cancer Institute’s HIV-screening dataset. We present experimental results on a small-scale computing environment.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The wild common bean (Phaseolus vulgaris) is widely but discontinuously distributed from northern Mexico to northern Argentina on both sides of the Isthmus of Panama. Little is known on how the species has reached its current disjunct distribution. In this research, chloroplast DNA polymorphisms in seven non-coding regions were used to study the history of migration of wild P. vulgaris between Mesoamerica and South America. A penalized likelihood analysis was applied to previously published Leguminosae ITS data to estimate divergence times between P. vulgaris and its sister taxa from Mesoamerica, and divergence times of populations within P. vulgaris. Fourteen chloroplast haplotypes were identified by PCR-RFLP and their geographical associations were studied by means of a Nested Clade Analysis and Mantel Tests. The results suggest that the haplotypes are not randomly distributed but occupy discrete parts of the geographic range of the species. The current distribution of haplotypes may be explained by isolation by distance and by at least two migration events between Mesoamerica and South America: one from Mesoamerica to South America and another one from northern South America to Mesoamerica. Age estimates place the divergence of P. vulgaris from its sister taxa from Mesoamerica at or before 1.3 Ma, and divergence of populations from Ecuador-northern Peru at or before 0.6 Ma. As these ages are taken as minimum divergence times, the influence of past events, such as the closure of the Isthmus of Panama and the final uplift of the Andes, on the migration history and population structure of this species cannot be disregarded.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Two-stage designs offer substantial advantages for early phase II studies. The interim analysis following the first stage allows the study to he stopped for futility, or more positively, it might lead to early progression to the trials needed for late phase H and phase III. If the study is to continue to its second stage, then there is an opportunity for a revision of the total sample size. Two-stage designs have been implemented widely in oncology studies in which there is a single treatment arm and patient responses are binary. In this paper the case of two-arm comparative studies in which responses are quantitative is considered. This setting is common in therapeutic areas other than oncology. It will be assumed that observations are normally distributed, but that there is some doubt concerning their standard deviation, motivating the need for sample size review. The work reported has been motivated by a study in diabetic neuropathic pain, and the development of the design for that trial is described in detail. Copyright (C) 2008 John Wiley & Sons, Ltd.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Nested clade phylogeographic analysis (NCPA) is a popular method for reconstructing the demographic history of spatially distributed populations from genetic data. Although some parts of the analysis are automated, there is no unique and widely followed algorithm for doing this in its entirety, beginning with the data, and ending with the inferences drawn from the data. This article describes a method that automates NCPA, thereby providing a framework for replicating analyses in an objective way. To do so, a number of decisions need to be made so that the automated implementation is representative of previous analyses. We review how the NCPA procedure has evolved since its inception and conclude that there is scope for some variability in the manual application of NCPA. We apply the automated software to three published datasets previously analyzed manually and replicate many details of the manual analyses, suggesting that the current algorithm is representative of how a typical user will perform NCPA. We simulate a large number of replicate datasets for geographically distributed, but entirely random-mating, populations. These are then analyzed using the automated NCPA algorithm. Results indicate that NCPA tends to give a high frequency of false positives. In our simulations we observe that 14% of the clades give a conclusive inference that a demographic event has occurred, and that 75% of the datasets have at least one clade that gives such an inference. This is mainly due to the generation of multiple statistics per clade, of which only one is required to be significant to apply the inference key. We survey the inferences that have been made in recent publications and show that the most commonly inferred processes (restricted gene flow with isolation by distance and contiguous range expansion) are those that are commonly inferred in our simulations. However, published datasets typically yield a richer set of inferences with NCPA than obtained in our random-mating simulations, and further testing of NCPA with models of structured populations is necessary to examine its accuracy.